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(54) Title: DIAGNOSIS AND TREATMENT OF ALZHEIMER'S DISEASE 

(57) Abstract: A gene has been identified in human chromosomes, on human chromosome 21 in the region ql l-q21. The gene has 
been sequenced and shown to have sequence homologies with the ubiquitin specific protease family. The recombinant gene has been 
cloned into ELcoli, and the product shown to have ubiquitin specific protease properties. It is believed that the activity of the enzyme 
may be involved in the pathway leading to neurofibrillar tangles observed in Alzheimer's disease and/or in general neurotoxicity 
leading to progressive neuronal degeneration and cell death. The invention relates to processes for diagnosing Alzheimer's disease, 
for treating Alzheimer's disease and for investigating the mechanism of Alzheimer's disease. The sequence of the enzyme has 
GenBank accession No. AF 134213. The gene has the name USP25, approved by the HUG^ Nomenclature Committc -. 
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Diagn sis and Treatment of Alzheimer's Dis ase 



The present invention relates to Alzheimer's Disease, a possible 
pathogenesis of the disease, processes for investigation and diagnosis of 
5 the disease, and for treatment of the disease. In particular the invention 
relates to the discovery of a gene, and the activity of its product, on human 
chromosome 21. 

Background of the Invention 

10 

Alzheimer's disease (AD) 



Alzheimer's disease (AD) (Alzheimer A.) is the most common form of 
dementia in humans, affecting up to 5% of human population above the age 

15 of 65 (Terry RD et a/J. Less than 1 5% of all AD are a familial, autosomally 
inherited form, and more than half of these are explained by causative 
mutations in presenilin gene (PS-1) on chromosome 14. A minority of familial 
cases are found to carry causative mutations in presenilin 2 (PS-2) gene on 
chromosome 1 and amyloid precursor protein (APP) on chromosome 21. The 

20 majority of AD cases (>85%) are known as sporadic AD, although the 
distinction between familial and sporadic AD becomes very elusive in the 
late onset AD (age of onset >70), due to an insufficient number of testable 
living relatives. It is therefore possible that the sporadic AD, specially the 
most prevalent late onset type, is also an unrecognised genetic disease. 

25 The main pathohistological findings that define a diagnosis of AD are 

two kinds of neuropathological protein deposits in AD brains, in particular in 
the hippocampus, neocortex and amygdala: neurofibrillary tangles (NFT) and 
amyloid plaques (AP). Amyloid plaques are extracellular accumulations of 
amyloid material which occur in normal ageing brains (senile plaques), but 

30 are much more prominent in AD, and do not occur in other 
neurodegenerative dementias. Neurofibrillary tangles are large 
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intraneuronal, usually perikaryal masses that are never observed in neurons 
from normal brains of any age, but are only visible in neurons of brains 
suffering from neurodegenerative diseases including AD. 

5 All individuals with Down syndrome (DS) develop early AD changes in 
brains 

Trisomy 21 (DS) individuals are known to be at risk of developing AD 
in middle life, though the clinical picture may in some cases be obscured by 

10 the pre-existing mental retardation, characteristic of DS. The brains of DS 
individuals in virtually 100% of cases exhibit neuropathology 
indistinguishable from AD by the end of the fourth decade of life. This 
includes NFTs and their major ultrastructural component, the PHFs, as well 
as amyloid plaques. Immunohistochemical studies have shown that amyloid 

15 is deposited in brains of trisomy 21 individuals about 50 years before that 
seen in the normal ageing population. In normal brains, soluble Aj3«is 
detectable at low levels. These levels increase with ageing, but do not 
necessarily lead to AD-like pathology. The highly insoluble A/J*is detected in 
brains of ageing euploid individuals only with the full AD neuropathology, 

20 and clinical AD. However, in DS brains (Teller), A/3* preceeds AjS^but is 
somehow kept soluble until the third and fourth decade of life, when AD 
neuropathology develops in its full picture, including NFTs and APs. 

Overexpression of /JAP produces amyloid plaques, but not NFT, in AD 
25 model animals 

Amyloid p-peptide is known to play a major role in the pathogenesis of 
AD. Ap is derived by proteolytic processing from the p-amyloid precursor 
protein (p APP), encoded by a gene located on human chromosome 
30 21q21.2. 
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A number of studies have attempted generating a mouse model with 
typical pathohistological changes of AD. The most abundant are approaches 
by over-expressing the APP gene product. Constructs driven by powerful 
promotors have been intruduced into mouse genome leading to over- 
expression of the normal form of APP holoprotein (refs 4-10 in Games), the 
carboxy-terminal peptides of APP, including the part from which Ap^fAP) is 
carved (Shoji), or of the APP genes bearing well-established mutations 
corresponding to those causing some types of familial early onset AD 
(FEOAD) (Games, Calhoun). The study by Games succeeded in producing 
widespread deposits of 0AP resembling amyloid plaques of AD, but in 
addition of over-expressing the mutated form of APP, this construct was 18 
fold more over-expressed than any previous attempts due to the choice of 
promotor. The attempt by Calhoun generated plaques as well as evidence of 
widespread neuronal loss in the hippocampus and neocortex of transgenic 
mice. There was no evidence whatsoever in any of the above mentioned 
models of the presence of PHF or NFT, or of staining with antibodies against 
any epitopes characteristic of PHF or NFT in the transgenic mice. The 
outcome of these attempts strongly proofs that overexpression of APP alone 
is not sufficient to explain the whole of AD the histopathology, which 
nevertheless occurs in 100% of trisomy 21 cases. The obvious imposing 
explanation is that the elevated dose of other chromosome 21 encoded 
gene(s) must play a role, perhaps together with that of APP. 

The only mouse neuronal ceils in which the presence of epitopes 
characteristic of PHFs and NFT s (such as t -NFT and ubiquitin), in addition 
to staining with B AP, was demonstrated (Richards), are neurons 
transplanted into normal mouse recipients, originating from developing 
brains of mice trisomic for the whole of mouse chromosome 16, which in 
some of its portion is syntenic to most of the long arm of human chromosome 
21 (containing DNA analogous to human regions from 21 q1 1 through to 
21q22.1 ). The donor mice however die in utero due to severe malformations 
of several organ systems, and are an imperfect model for DS, because of the 
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trisomy of portions of mouse chromosome 16 syntenic also to other human 
chromosomes, and because of the lack the trisomy of the segments syntenic 
to the last telomeric portions of human chromosome 21 (21q22.2-qter). 
Though highly disputed, and not reproducible in all mouse strains (Lane eta/ 
1 996), the study by Richards hinted at the possibility that the trisomy of the 
human segment 21 q1 1-21q22.1 (which includes APP) could be all that is 
necessary for the generation of both types of major AD hallmarks: PHF 
(NFT) and amyloid plaques. However, when mice Ts65DN were generated 
(Reeves) trisomic for the two thirds of this segment (from and including APP 
to the end of the segment in 21q22.1), neither amyloid nor PHF (NFT) 
staining or pathology was observed, despite an increased dose of APP 
mRNA as demonstrated by Northern blot hybridisation in these mice, which 
live to adulthood. This result suggests that trisomy of genes in the most 
proximal part of the 21q (21q1 l-q21) could play a central role in causing AD 
pathology in DS, both in generating the NFT (PHF), and perhaps in 
combination with APP overexpression, in generating the /?AP deposits and 
plaques. 



Paired Helical Filaments (PHF) are aggregates of ubiquitin (Ub) and 
20 altered Tau (t) 



The progressive formation of 10-nm thick filaments wound into a helix 
with a half-periodicity of 80 nm (thus termed paired helical filaments or 
PHFs) is one of the most typical and pathognomonic pathohistological 
features characterizing AD. The PHFs accumulate as large perikaryal 
masses called neurofibrillary tangles (NFTs), or occur as small bundles in 
dystrophic neurites that form plaques in AD brain. Biochemical analysis of 
the core material of the PHF, through analysis of pronase resistant core of 
the filament and a proteinaceous smear by SDS-PAGE and immunoblotting 
revealed that the PHF is composed of only two proteins: ubiquitin and Tau 
(t). 



15 
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Several independent reports found NFTs and PHFs strongly and 
specifically staining with antibodies against ubiquitin (Mori, Perry, Manetto). 
One report studied inclusion bodies characteristic of other 
neurodegenerative diseases (Manetto) and found ubiquitin strongly present 
5 not only in PHFs of AD, but also in Pick bodies of Pick disease, Lewy bodies 
of Parkinson disease and NFTs of progressive supranuclear palsy (PSP). 
However, ubiquitin was not present in Hirano bodies and granovacuolar 
degeneration, indicating that although present in a wide variety of 
intracellular inclusion bodies of degenerating neurons, ubiquitin was not 
10 indiscriminatly present in all neuronal inclusions. 

Ubiquitin (Ub) is a 76 amino acid polypeptide present in ail eukaryotic 
cells, and highly conserved in evolution. Ubiquitin is conjugated to a target 
protein through an isopeptide bond between the c-amino group of Lys in a 
target protein (ubiquitination), a process mediated by three groups of 
enzymes: ubiquitin activating enzymes (E1), ubiquitin conjugating enzymes 
(E2) and ubiquitin ligases (E3). Ubiquitinated proteins exist in a 
monoubiquitinated form, or a multiubiquitin chain: the former is not a 
degradation signal, while the latter, Lys-48-linked ubiquitin-ubiquitin(n) 
conjugate, works as a strong degradation signal when joined to a Lys in a 
target protein. Protein conjugated to polyubiquitin is then very rapidly and 
efficiently degraded by non-lysosomal, ATP-dependent degradation by the 
26-S proteasome. Two classes of enzymes termed ubiquitin specific 
proteases (USPs) and ubiquitin-C-terminal hydrolases (UCHs) are capable 
of de-conjugating the ubiquitin-ubiquitin and ubiquitin-protein links, thereby 
converting polyubiquitin into mono-ubiquitin, and de-coupling 
(deubiquitinating) ubiquitin from the target protein, with the result of 
preventing the degradation of the target protein. 

The class of UCH enzymes tends to include relatively small proteins 
(about 35-40kD) which have low specificity for ubiquitinated proteins. BAP1 
is a UCH but is unusual in that it is larger, having a weight of nearly 100kD. 
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Several such enzymes are known, the sequences of which show some 
sequence homologies especially in two domains, the Cys and His domains. 

Very broadly, two main functions have been observed among the 
various members of the USP (UBP) superfamily (Wilkinson 1997, Wilkinson 
5 and Hochstrasser 1 998). The first is the generation of free ubiquitin from 
precursor fusion proteins or from peptide-linked polyubiquitin after 
proteolysis of the targetted protein, and the second is de-ubiquitination. 
When a protein is targetted for ubiquitin mediated degradation, it is linked to 
ubiquitin via an isopeptide bond between the C-terminus of ubiquitin and a 
10 lysine e-amino group(s) of the acceptor protein. Once the conjugate is 
formed, it can have only two fates: non-lysosomal proteolysis mediated by 
the 26S proteasome resulting in total protein degradation, or de-conjugation 
from ubiquitin (de-ubiquitination), resulting in the rescue of the target protein 
from degradation (Wilkinson 1997, Wilkinson and Hochstrasser 1998). 
1 5 The PHF-s are strongly staining for a sub-fragment of t, the t -NFT. 

This abberrant form of r has an enormously high affinity for r itself, much 
higher than the physiological affinity for microtubules. Therefore, once the 
threshold concentration of the r -NFT is reached, aggregates leading to PHF 
become inevitable, and their structural assembly process is relatively well 
20 explained. What is not clear, is what causes the initial generation of the t - 
NFT. 

Each single molecule of the amino-end cleaved x -NFT is 
ubiquitinated at the Lys residue with a single Ub molecule. Ubiquitination 
with a single Ub is an insufficient signal to trigger the 26S proteasome 

25 degradation, which normally requires polyubiquitinated target protein. USP's 
are capable of hydrolysing the thiol bonds between individual ubiquitins and 
reversing poly-Ub back to mono-Ub. The same enzymes also cleave the 
single Ub off the substrate, recycling both the substrate and Ub. 

Screening of cDNA libraries prepared from frontal cortex of an AD 

30 patient and from fetal brain revealed a microtubule-associated protein r 
(Goedert). In PHFs, an altered form of r was detected, which was processed 
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at the amino-terminal end, leaving only the carboxy-terminal third of the 
polypeptide (Wischik). This altered form, termed PHF- t was found to bind 
the insoluble core of the PHF very strongly, more then 20 times stronger 
than the physiological afinity of full length t for microtubules (Wischik, 
5 Beyreuther). It was further revealed that ubiquitin present within PHFs in 
conjugated to amino-terminally processed t in the mono-ubiquitinated form 
predominantly, with the conjugation sites localized to the microtubule-binding 
region (Morishima-Kawashima). The same authors suggested that the 
ubiquitin-degradation system could be affected in the degenerating neurons 
10 of AD brain. 

E6AP (Human Papilloma Virus E6 associated protein) functions as 
one of the two so far detected ubiquitin ligases (attaching ubiquitin and 
labelling for degradation) for the master tumour suppressor p53 (Scheffner et 
al. 1993). Hauptetal. 1997, Kubbutat et al. 1997, Lane 1998). 
1 5 Lack of functional E6AP accelerates the polyglutamine-induced 

neuronal cell death in the mouse model for the neurodegenerative disease 
Spinocerebellar-ataxia 1 (SCA1) (Cummings et al. 1999). Lack of E6AP 
gene in a mouse expressing the polyglutamine stretch mutation of SCA1 
protein dramatically reduces the presence of ubiquitinated intranuclear 
20 neuronal inclusions, but drastically accelerates the neuronal degeneration 
and cell death (Cummings et al. 1999). A very similar effect has been 
observed in Huntington's Disease (HD), where a dominant negative mutant 
of a ubiquitin conjugating enzyme (UBC3), when co-expressed in cultured 
neurons with the huntingtin protein bearing the polyglutamine extension 
25 (mutation causing HD), drastically reduces the presence of ubiquitinated 
intranuclear neuronal inclusions, but drastically accelerates the neuronal 
degeneration and cell death (Saudou et al. 1998). If USP25 de-ubiquitinates 
a similar set of target proteins to the ones ubiquitinated by E6AP, then the 
overexpression of USP25 may lead to similar effects as the inhibition of 
30 ubiquitin conjugation by E6AP. 
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Lowe et al. (1990) show that a ubiquitin carboxyl-terminal hydrolase is 
selectively present in inclusion bodies characteristics of human 
neurodegenerative diseases. The authors have used immunohistochemical 
methods to determine the presence of the UCH. They found immuno 
5 reactivity in Lewy bodies but not in NFT's of AD. 

In recent years a number of other protein modifying polypeptide tags 
have been identified. Many of these are related to ubiquitin and have high 
levels of identity and similarity (determined using the BLAST algorithm, for 
instance) to ubiquitin itself. There is a recognised super family of such 

10 proteins which have been termed ubiquitin-like proteins (UbL) (Gong et al. 
1997, Schwarz et al. 1998). The yeast Smt3 and human SUMO-1 (PIC1, 
Sentrin, hSmt3C), SUMO-2 (hSmt3A) and SUMO-3 (hSMT3B) belong to the 
same family of UbL proteins with approximately 50% identity between 
themselves, and some 15-30% identity and 40-60% similarity in amino acid 

15 sequence to ubiquitin (Lapenta et al. 1997, Mannen et al. 1996, Kamitani et 
al. 1998, Saitoh and Hinchey, 2000). Yeast and human UBC9 are capable of 
conjugating equally yeast or human UbL-s, but not ubiquitin (Schwarz et al. 
1998). The SUMO-1 ,-2 and -3 have the C-terminal glycine, necessary for 
ubiquitination of the target protein's lysine residue, but unlike ubiquitin, do 

20 not have the Lys48 residue necessary for the formation of polyubiquitin 
chains through isopeptide bonds, which are the signal for the proteasome 
degradation (Saitoh and Hinchey, 2000). Nevertheless, yeast Smt3 protein 
can rescue the mutant Mif2 phenotype, a deficient centromere binding 
protein resulting in chromosome missegregation (Meluh and Koshland 

25 1995). SUMO-1, as well as SUMO-3 (and probably also SUMO-2) are all 
capable of being attached by UBC9 to RanGapl , a Ran GTP-ase activating 
protein (Kamitani et al. 1998). This ATP-dependent attachment is essential 
for the binding between modified RanGapl and RanBP2 binding protein, in 
order to form functional nuclear pore complex, which controls export and 

30 import of molecules through the nuclear envelope (Mahajan et al. 1 997, 
Matunis et al. 1998, Lee et al. 1998). In addition, UbL small proteins have 
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been shown to modify the death domains of Fas (Okura et al. 1996), Tumour 
necrosis factor receptorl (Okura et al. 1996), PML (a tumour suppressor 
implicated in the pathogenesis of acute promyelocytic leukaemia) (Kamitani 
et al. 1998b) and Rad51/52 DNA repair proteins (Shen et al. 1996a). Their 
5 conjugating enzyme, UBC9, has been shown to interact by Y2H technique 
with RAD51/52 DNA repair proteins, and the master tumour suppressor p53 
(Shen et al. 1996b). Another UbL is NEDD8 (Kamitami et al 1997). 

UbL's are conjugated and cleared from their targets by enzymes. 
Several UbL hydrolase enzymes have been identified which convert 
10 precursor UbL to active UbL. Some such enzymes interact with ubiquitin 
itself as well as with other UbL's. Proteases involved in cleavage of 
conjugates of UbL with target protein have been identified for instance 
SENP1 and SUSP-1 , which were recently cloned (Kim et al. 2000, Gong et 
al. 2000a), and found to specifically cleave SUMO-1 ,-2 and -3, but not 
15 ubiquitin and NEDD8. The first human enzyme with classical USP structure 
(Cys, His domains) for which dual specificity to both ubiquitin and ubiquitin 
like protein was demonstrated was very recently published USP21 on 
chromosome 1q21 (Gong et al. 2000b). However, opposite from SENP1 and 
SUSP-1, this enzyme cleaves ubiquitin and Nedd8, but not SUMO-1 ,-2 or -3 
20 (Gong et al. 2000b). 

The proximal third of the chomosome 21 long arm is an exceptionally 
gene-poor region of the human genome as estimated by a number of criteria 
(Shimizu et al., 1995, Yaspo et al., 1995, Gardiner 1996) the estimates of 
gene-density range from one gene in a megabase to one gene in six 
25 megabases of genomic DNA. Until recently only three full length genes had 
been mapped in this region: STCH (a member of the hsp70 family) (Brodsky 
et al., 1995), RIP140 (protein functionally interacting with a variety of nuclear 
receptors such as estrogen receptor), (Cavailles et al., 1995) and ANA (a 
previously mentioned member of the Tob/BTG1 family of tumour 
30 suppressors), (Kohno et al., 1998). This region is also an example of 
extremely highly methylated regions in the human genome. 
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Groet etal. (1998) describe a high-resolution bacterial contig map of 
3.4 Mb of genomic DNA in human chromosome 21q11-q21 , encompassing 
the region of elevated disomic homozygosity in Down's syndrome - 
associated abnormal myelopoiesis and leukemia, and which has shown a 
5 strong association with Alzheimer's disease. It was suggested that the high 
resolution bacterial clone overlap map should be the basis for deriving a 
more complete transcriptional map of that region of the chromosome. It was 
hoped that this would lead to an explanation of the chromosome 21 q1 1 
linkage in FEOAD families. In particular it was suggested that a modifier 
10 gene in that region could act together with the PSEN 1 gene to generate or 
modify the AD phenotype. 

Valero et al 1999 have, in parallel but published after the first priority 
date hereof, identified the gene in human chromosome 21 as well as mouse 
chromosome 16. The gene products have close sequence homologies and 
are identical in the Cys, QQD and the regions. The authors suggest a role of 
the gene in AD. Valero et al name the gene USP25. 

Further work by the present inventors has revealed and sequenced a 
new gene in the proximal third of chromosome 21 and has shown that the 
product of this gene has sequence homologies to USP's as well as ubiquitin 
specific protease properties, it is postulated that the gene product and 
USPs generally may have a role in AD. This further work has been 
published as Groet et al 2000 after the first priority date hereof. 

According to the invention there is provided a new use of ubiquitin 
specific protease, an inhibitor thereof or the gene therefor, or a specific 
inhibitor or promoter of such a gene, in the manufacture of the composition 
for use in the treatment, diagnosis or prophylaxis of AD. 

According to a further aspect there is provided a new use of ubiquitin- 
like specific protease, an inhibitor thereof or the gene therefor, or a specific 
inhibitor or promoter of such a gene, in the manufacture of the composition 
for use in the treatment, diagnosis or prophylaxis of AD. 
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There is also provided an in vitro method of diagnosis of AD in a 
human or animal in which a ubiquitin specific protease, or the gene therefor 
is detected in a sample taken from a patient suspected of suffering from AD, 
wherein an alteration in the level of expression or in the sequence of the 
5 translated SP is detected. In a further aspect there is also provided an in 
vitro method of diagnosis of AD in a human or animal in which a ubiquitin- 
like specific protease, an inhibitor thereof or the gene therefor, or a specific 
inhibitor or promoter of such a gene, in the manufacture of the composition 
for use in the treatment, diagnosis or prophylaxis of AD. 

10 In the diagnostic method, the specific protease (SP) may be detected 

immunochemical^, using antibodies specifically directed to the SP, or by 
contacting the sample with a substrate for SP under conditions whereby 
proteolytic reaction should take place, and detecting the product of enzymic 
reaction. The product of the reaction of the reaction may be detected 

15 photometrically (for instance fluorometrically) or immunochemically, for 
instance using an antibody to the cleavage product. The antibody may 
either react specifically with the cleavage product but not with the substrate, 
or else may react with both, in which case the cleavage product and the 
substrate must be capable of being separated from one an other for instance 

20 using gel electrophoretic techniques. Methods for the detection of changes 
in the DNA sequence or in methylation or in the quantity or sequence of 
mRNA, for instance using PCR techniques may be used in the diagnostic 
method, for instance carried out on the whole organism or on individual 
tissue sections or samples or cells, or several different cells in paralllel 

25 The present invention also includes a method for investigating the 

pathogenesis of AD by the use of a USP. 

In a further aspect of the invention there is provided a method for 
investigating the pathogenesis of AD by the use of a ubiquitin-like specific 
protease. 

30 There is also provided in the present invention a method of 

synthesizing a protease having Cys, QQD and His domains specified in 
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sequence ID'S Nos. 2, 3 and 4, respectively in which nucleic acid encoding 
the protease is introduced into a microorganism or a cell-line in a form in 
which it can be replicated, transcribed and translated, and the 
microorganism or cell-line is cultured under conditions whereby the nucleic 
S acid is replicated, transcribed and translated to form protease, and the 
protease is collected. 

Preferably in the invention the gene encoding the USP has been 
derived from the respective exons, usually mainly or wholly without the 
introns, of the naturally occurring gene from animal, preferably human, DNA, 
10 preferably from human chromosome 21 or the equivalent non-human animal 
chromosome, such as mouse chromosome 16. 

Preferably the gene is incorporated into a construct, which preferably 
comprises ds DNA. Alternatively the construct may comprise ss DNA or 
RNA 

15 In the present invention, the ubiquitin specific protease or ubiquitin- 

like specific protease is preferably defined as a protein having ubiquitin 
specific protease or ubiquitin-like specific protease activity and including the 
following three domains having the specified sequences of the general 
formulae I, II and III 

20 

GXX 1 NX 2 GX 3 TCX 4 X 5 X 6 X 7 X 8 X 9 QX 10 X 11 I 
QX^DX^X^EX^X^X^X^X^X^X^X^X^X 25 II 
YX 26 LX 27 X 28 X 29 X 30 X 31 HX 32 GX 33 X 34 X 35 X 36 X 37 GHYX 38 X 39 X 40 III 

25 in which 

X is the residue of a non-polar amino acid 

X I is the residue of an amino acid with an uncharged or basic R group 
X 2 is the residue of an amino acid with an uncharged R group 

X 3 is the residue of an amino acid with an uncharged R group 
30 X 4 is the residue of an amino acid having a relatively large uncharged 

R group 

X s is the residue of an amino acid having an uncharged R group 
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X 6 is the residue of an amino acid having a relatively polar uncharged 
R group 

X 7 is the residue of an amino acid having an uncharged R group 
X 8 is the residue of an amino acid having a non-polar R group 
5 X 9 is the residue of an amino acid having an uncharged R group 

X 10 is the residue of an amino acid having an uncharged R group 
X 11 Is the residue of an amino acid having an uncharged R group 
X 12 is Q or H 

X 13 is the residue of an amino acid having a polar R group 
10 X 14 is the residue of an amino acid having an uncharged or basic R 

group 

X 15 is the residue of an amino acid having an uncharged R group 
X 16 is the residue of an amino acid having an uncharged R group 
X 17 is the residue of an amino acid having an uncharged polar R 
15 group or a basic R group 

X 18 is the residue of an amino acid having an uncharged or a basic R 

group 

X 19 is the residue of an amino acid having a polar R group 
X 20 is the residue of an amino acid having an uncharged R group 
20 X 21 is the residue of an amino acid having an uncharged polar R 

group or an acidic R group 

X 22 is the residue of an amino acid having an uncharged polar R 
. group or a basic R group 

X 23 is the residue of an amino acid having a non-polar R group 
25 X 24 is the reside of an amino acid having an uncharged polar R group 

or an acidic R group 

X 25 is the residue of any amino acid 
X 26 is the residue of any amino acid 
X 27 is the residue of an uncharged or a basic amino acid 
30 X 28 is the residue of an amino acid having a non-polar R group 

X 29 is the residue of an amino acid having a non-polar R group 
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X 30 is the residue of an amino acid having a non-polar R group 
X 31 is the residue of an amino acid having an uncharged R group 
X 32 is the residue of an amino acid having an uncharged polar R 

group or an acidic R group 
5 X 33 is the residue of an amino acid having an uncharged or an acidic 

R group 

X 34 is the residue of an amino acid having an uncharged R group 
X 35 is residue of an amino acid having an uncharged or a basic R 

group 

10 X 36 is the residue of an amino acid having an uncharged or a basic R 

group 

X 37 is the a bond or the residue of an amino acid having an uncharged 
polar R group or a basic R group, or is a bond 

X^is the residue of an amino acid having an uncharged R group 
15 X 39 is the residue of an amino acid having an uncharged R group and 

X 40 is the residue of an amino acid having an uncharged R group. 
The first domain mentioned above is also known as the Cys domain, 
the second is also known as the QQD domain and the third is also known as 
the His domain. In the present invention the R group of an amino acid is the 
20 side chain attached to the a-carbon atom. 

In the present invention, the term amino acid having a non-polar R 
group includes A, V, L, I, P, F, W, M and G. The term amino acid residue 
having an uncharged polar R group includes P, W, G, S, T, C, Y, N and Q. 
An amino acid having an acidic R group is selected from D and E. An amino 
25 acid residue having a basic R group is selected from K, R and H. 

X is preferably F or most preferably L. 

X I preferably has a basic R group, most preferably being K or R. 
Polar uncharged groups X 1 are preferably selected from Y and M. It is less 
preferred that X 1 is an amino acid residue with a polar R group. 

30 X 2 is preferably a residue of an amino acid having a non-polar R 

group. Preferably it is selected from V, S, U, A, L and F. 
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In X 3 , the R group is preferably a low molecular weight group, the X 3 
residue preferably being selected from A and N. 

X 4 is preferably a relatively high molecular weight residue, most 
preferably selected from Y and W. 
5 X s is preferably selected from F, M, L and C. 

X s is preferably a relatively low molecular weight residue, most 
preferably selected from N and S. 

X 7 is preferably selected from A, C and S. 

X 8 and X 9 are preferably selected from residues in which the R group 
10 is a - alkyl group, most preferably selected from V, L and I. X 9 may 
alternatively be S. 

X 10 is preferably selected from the same groups as X 2 . Most 
preferably it is selected from S, V, Q, A and T. 

X 11 is preferably selected from Y and, most preferably L. 
15 X 12 is preferably Q. 

X 13 is preferably selected from residues in which the R group is a C, or 
3 - alkyl and is thus preferably V or A, most preferably V. 
X 14 is preferably selected from S, T, Q, L and H; 
X 15 is preferably selected from F, L, M and V. 
20 X 16 is preferably selected from T, L, N, F and C; 

X 17 is preferably selected from H, T, R, N and Q; 
X 18 is preferably selected from K, L, I, V, S, C and Y; 
X 19 is preferably the residue of an amino acid having a C 3or 4 - alkyl 
group as R, most preferably selected from L, I and V. 
25 X 20 is preferably L. 

X 21 is preferably the residue of an amino acid having an acidic R 
group, and is most preferably D. 
X 22 is preferably W. 

X 23 is preferably selected from the same groups as X 19 . Most 
30 preferably it is L. 
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X 24 is preferably a polar uncharged amino acid or an acidic amino 
acid residue, most preferably being selected from E, Q, 0 and A. 

X 25 is preferably an acidic residue, most preferably being D. 

X 26 is preferably selected from R, E, V, N, G, D and I; and 
5 X 27 is preferably selected from H, Y, V or F most preferably being H. 

All of X 28 to X 31 are preferably selected from non-polar amino acids, 
most preferably in which the R group is hydrogen or a C, w 4 alkyl group. X 28 
preferably has a small R group and is most preferably A or G. X 29 and X 30 
are preferably selected from the same groups as X 19 . X 31 may be one of the 
10 same groups as X 19 or may be M. Preferably X 31 is V. 

X 32 is preferably selected from E, S and C. 

X 33 preferably has a polar R group or an acidic R group. Polar R 
groups are preferably Q or S. 

In X 34 , the R group is preferably non-polar, for instance being selected 
15 from the same groups as X 19 , A and G. 

X 35 is preferably a polar R group which is uncharged, most preferably 
being selected from N, H, S and P, most preferably N. 

X 36 is selected from A, R, N, T, G and S; 

R 37 is preferably a bond. 
20 R 38 is preferably non-polar, most preferably being selected from Y, W, 

T, I and V. Most preferably it is W. 

X 39 is preferably a group in which R is polar, and is most preferably 
selected from A, S, T and V. 

X 40 is preferably selected from Y, L, S and I. 
25 The SP should preferably have at least 200 residues, most preferably 

at least 500 residues, for instance between 750 and 1 500 residues, most 
preferably around 1000 residues. 

The preferred SP has Cys, QQD and His domains with the sequences 
of Sequence ID'S nos. 2-4, respectively. 
30 The preferred SP has the sequence ID no. 1 . 
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The sequence homology of the protein having sequence ID no. 1 
(USP25) with other USPs' is expressed in figure 3 hereof. The sequences of 
the USP's are derived from references mentioned above. 

The SP is generally made by genetic engineering techniques. The 
5 nucleic acid in the microorganism used to synthesize the SP may be cDNA 
derived from mRNA t from human or animal sources. For instance the 
nucleic acid may be constructed from PACs which are publicly available, 
together with cDNA by suitable restriction enzyme cleavage, ligation and/or 
chain extension techniques. Alternatively the gene may be synthesized 
10 through other genetic engineering techniques. 

The nucleic acid coding for a SP should encode the protein having 
the domains specified above. Preferably the nucleic acid is DNA having the 
portions of sequence ID no.5 which code for the respective domains or a 
sequence encoding the same protein sequence. The DNA sequence may 
15 include the regulatory regions (Untranslated regions, UTRs) from the native 
gene and may include (GCC) n repeats in the 5' regulatory, or transcribed and 
or translated region. 

There is also provided in the present invention a non-human animal 
model to be used to investigate the mechanism of AD involving USP. The 
20 animal model may have the USP gene knocked out, hemi- or homozygously, 
or alternatively may be polysomic for the relevant gene. Alternatively the 
gene may be modified by disrupting the transcriptional control whereby the 
animal has reduced or increased levels of USP. 

The USP of particular relevance to AD is believed to be that found on 
25 by the present inventors on human chromosome 21 long arm at q1 1-q21 . It 
is believed that the gene is likely to be found in the analagous portion of 
mouse chromosome 16. The model animal is likely to be a mouse. 

The invention provides also a nucleic acid construct comprising a 
USP gene encoding a product having (Cys, QQD and His regions as 
30 specified in sequence ID'S Nos 2,3 and 4) and an origin of replication. 

Preferably the gene encodes sequence ID No. 1 or No. 5, or a USP-active 
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fragment or homologue thereof. A homologue is the corresponding gene 
from a non-human mammal. Most preferably the gene has bases 199-3367 
of sequence ID No. 6, (represented as sequence ID No 7). A further aspect 
of the invention provides a nucleic acid construct encoding a product having 
5 Cys f QQD and His regions as defined in the general formulae I, II and III, 
respectively, and an origin of replication. A further aspect of the invention 
provides a nucleic acid construct encoding a product having Cys, QQD and 
His regions of sequence ID'S 2, 3 and 4, respectively, or sequences having 
at least 20% identity with the specified sequences, preferably at least 50% 

10 identity, and an origin of replication. According to a further aspect of the 
invention there is provided a nucleic acid construct encoding a product 
having a contiguous sequence of at least 10 residues having at least 50% 
identity with a contiguous sequence of residues of sequence ID 1 , and an 
origin of replication. In this aspect the product should preferably have a 

15 contiguous sequence of residues of sequence ID 1 . Additionally or 

alternatively the product may have two or more, for instance three or four, 
contiguous sequences of at least 10 bases having at least 5% identity with 
respective contiguous sequences of sequence ID 1. In each case the level 
of similarity in the product with sequence ID 1 is preferably at least 50% (as 

20 determined using the BLAST algorithm), more preferably at least 70% in the 
related regions. The said contiguous sequences of sequence ID 1 may be 
the Cys, QQD and/or His regions thereof (sequence ID's 2, 3 and 4) or may 
be other regions for instance regions associated with the recognition of 
protein-ubiquitin or protein-ubiquitin-like protein targets. 

25 Preferably the construct is formed from a vector having an origin of 

replication and the gene of interest. Preferably the construct also has a 
promoter for the initiation of transcription, generally derived from the vector. 
There is also provided a transformed microorganism or a transfected cell- 
line containing the construct wherein the construct is capable of being 

30 replicated. The cell-line may, for instance, be based on neurones, 
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neuroblasts or neuro epithelial cells and may be particularly useful for 
investigating the steps which take place in the generation of NFTs. 

In a further aspect of the invention there is provided a protein product 
which is not a functional protease which has a contiguous sequence of at 
5 least 10 residues having at least 50% identity with a contiguous sequence of 
sequence ID 1 . Preferably the product has a contiguous sequence of at 
least 20 residues, for instance 50 or more, having at least 20%, preferably at 
least 50%, identity with a contiguous sequence of sequence ID 1. The said 
contiguous sequences of sequence ID 1 may be the Cys, QQD and/or His 

10 regions thereof (sequence ID'S 2, 3 and 4) or may be other regions, for 
instance regions associated with the recognition of protein-ubiquitin or 
protein-ubiquitin-like protein targets. In this aspect the level of similarity of 
the product with sequence ID 1 is preferably at least 50% (as determined 
using BLAST algorithm) more preferably at least 70% in the related regions. 

15 Preferably the product has contiguous sequences with at least 50% identity, 
preferably more than 90% identity, with the Cys, QQD and His regions of 
sequence ID 1 (sequence ID's 2, 3 and 4). Preferably the product has 
contiguous sequence which differ from those of sequence ID's 2, 3 and/or 4 
in respect of one, two or three residues, preferably including the Cys residue 

20 which is the 9th residue of sequence ID 2, and/or one or more of the QQD 
residues which are the 1st to 3rd residues of sequence ID 3, and/or the His 
residue which is the 17th residue of sequence ID 4, most preferably the Cys 
residue of sequence ID 3. Preferably this embodiment of the product has 
sequence identity in the remaining residues with a contiguous sequence of 

25 at least 20% of sequence ID 1 , preferably at least 50%, more preferably at 
least 80%, of sequence ID 1 . 

The novel protein products of the invention may be competitive 
inhibitors for the native protein product of sequence ID 1 and thus have 
useful therapeutic activity, for instance in the treatment of AD. Alternatively 

30 they may be useful components of kits for use in methods for investigating 
the activity of the protein product of sequence ID 1 , or for identifying the 
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presence or level of the protein product of sequence ID 1 in a biological 
sample, for instance in a diagnostic test for AD. 

The present invention further provides a protein having sequence ID 1 
in substantially isolated form. In this context, "substantially isolated" means 
5 that the protein has been recovered from its source, whether that be a 
natural source, for instance from human tissue, or recombinant 
microorganism or transformed cell line source, in a process in which at least 
some of the components of the source have been removed from the protein. 
Preferably the protein is an active USP or ubiquitin-like specific protease. 

10 It is believed that the ubiquitin specific protease activity of the protein 

having sequence ID 1 is responsible for its implication in the pathogenesis of 
AD. USP activity may be determined using the technique described in the 
Examples below, in which using bacteria cotransformed with the USP gene 
and with a reporter gene encoding a fusion protein which is a ubiquitin- 

15 conjugated detectable protein. The protein may be an enzyme detectable by 
direct enzyme reaction, by enzyme-linked immune assay techniques, by 
autoradio-graphically or by direct staining after gel separation under 
conditions suitable to separate ubiquitin and cleaved protein from fusion 
protein. 

20 From experiments conducted to determine with which proteins the 

product having sequence ID 1 interacts, we have found that there is 
interaction with ubiquitin, polyubiquitin and various ubiquitin precursors, as 
well as HHR23A (Matsutani etal 1994, GenBank Accession No D21235) 
there is also interaction with other ubiquitin precursors, there is also 

25 interaction with other ubiquitin-like proteins and with proteins which are 
known to interact with ubiquitin-like proteins such as Sumo-3 (Mannen et al 
1996, Kamitani etal 1996, Saitoh et al 2000, GenBank Accession No. NM 
006937) and ubiquitin-like-specific conjugating enzyme 9 (Schwarz et al 
1998, Lee ef al 1998, GenBank Accession No. U 66867). The novel isolated 

30 protein and the novel protein having no protease activity may therefore be 
characterised further by having a positive interaction in a yeast-two-hybrid 
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procedure with one or more, preferably all three, proteins having the 
sequences of GenBank Accession Nos. D 21235, NM 006937 and U 66867. 
Ubiquitin-like specific protease activity may be determined using techniques 
analogous to those used to determine ubiquitin specific protease activity, by 
5 using a substrate which is a fusion protein of the ubiquitin-like protein of 
interest and a detectable protein, and using the usual separation and 
immune based or autoradiographic identification techniques. 

In the present invention useful inhibitors of ubiquitin specific protease, 
for instance for use in the manufacture of a composition for treating AD, are 
10 ubiquitin analogues which compete with the ubiquitinated substrate and/or 
react with the protease enzyme so as to inactivate it. Known inhibitors 
include ubiquitin aldehyde, which has a C-terminal aldehyde group instead 
of a carboxylic group. Fragments of ubiquitin without activatable C terminal 
glycine, may also be inhibitors. Ubiquitin-like molecules, fragments thereof 
15 and C-terminal modified versions thereof may competitively inhibit USPs. 
The inhibitor may be a specific inhibitor and comprise a fusion protein for 
instance with a ubiquitin or UbL component and a recognition protein 
component for specific binding to the USP of interest. 

The new model of pathogenesis of AD which the present inventors 
20 propose assumes an aberrant increase of the cleavage by USP, particularly 
USP25 as the key event triggering the creation of PHF-s, and hence the key 
event leading to the generation of NFT-s and the pathogenesis of AD, and 
other neurodegenerative conditions characterized by the presence of 
neurons with NFT-s or similar ubiquitinated inclusion bodies. This increase 
25 in activity would increase the rate of reduction of poly-Ub to mono-Ub, and 
increase the de-ubiquitination of the substrate. This would prevent the 
otherwise extremely rapid degradation of r -NFT down the 26S proteasome 
route, and cause temporary increase in the intraneuronal concentration of r - 
NFT. Since the r -NFT has an abnormally high affinity for r instead of 
30 microtubules, this would cause the seeding of PHF-s. Once the r -NFT is 
seeded into the initial structure leading to PHF, it becomes resistant to both 
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ubiquitination and de-ubiquitination, perhaps by being sterically hindered 
and inaccessible to enzymes. This would block the removal of the last single 
Ub moiety from the t -NFT, hence the presence of mono-Ub t -NFT in PHF. 
The actual generation of r -NFT from normal r remains a mystery: it could 
5 either be the second hit necessary for the PHF seeding, or be part of normal 
neuron physiology, which is under normal conditions rapidly and efficiently 
polyubiquitinated and removed by the 26S proteasome. It is also possible 
that full length t itself is physiologically efficiently polyubiquitinated and 
removed by the 26S proteasome, but an aberrant increase in USP activity as 
10 the primary event causes improper (partial) degradation of t, by stopping the 
normal degradation process in its first steps, through an unusually quick 
conversion of polyubiquitinated to a monoubiquitinated substrate. The 
resulting partially cleaved, mono-ubiquitinated form could be the r -NFT. 

Presenilin-1 also uses the 26S proteasome pathway for its rapid and 
efficient degradation (Fraser). This degradation is deemed to be extremely 
rapid since ubiquitinated forms of presenilin have not been detected, and yet 
polyubiquitination is an absolute pre-requisite for a protein to be "labelled" for 
5 the 26S proteasome degradation. One of the most important functions of 
presenilin, pertinent to the pathogenesis of AD is that it appears to activate 
and promote the y- secretase cleavage of APP C-terminal fragment which 
has already been generated (Borchelt). This y- secretase cleavage is the 
last step necessary for the generation of BAP, the key component of amyloid 
10 plaques. An aberrantly increased activity of USP could slow-down the 
physiologically rapid degradation of presenilin, causing the increase in its 
concentration and activity, producing the similar effect as some of the 
mutations in pre-senilin causing the FEOAD. Indeed, studies on Presenilin-1 
knockout mouse models (deStrooper) support the explanation that clinical 
15 FEOAD mutations in Presenilin-1 result in a gain of the function of PS1 , and 
are not a haplotype insufficiency. 

The known USP-s in the human genome are UBH on 5q33, FAFX 



WO 00/78934 PCT/GB00/02423 

23 

on Xp11, HAUSP-related gene on 3p21, HAUSP on 16p13 and USP-1 on 
1q32. Related UCH genes are BAP-1 and UCH-L1 on 4p13. These and 
other USP's are reviewed in Baker ef al which lists the GenBank Accession 
Nos. USP's identified since the publication of Baker et al are USP20, 
5 GenBank Accession No. NM 006676. 1 , and USP21 (Gong et al 2000b) 
GenBank Accession No. AF 233442. UbL specific proteases SUSP-1 and 
SENP-1 have GenBank accession nos. AF 196304 and AF 149770, 
respectively. All these specific proteases may be useful in the respective 
aspects of the invention mentioned above. We now add a new gene on 

10 chromosome 21- USP25, located in 21q11-q21. A genetic change leading to 
over-expression of any and all of these could trigger the AD pathogenesis. 
This does not rule out the alteration in pre-senilin function or the APP 
cleavage as alternative primary causes. This hypothesis just postulates that 
sporadic AD could be caused by the de-regulation of some of these 

15 enzymes. Both enhancement of function and/or copy number of some and 
inhibition of function and/or copy number of other USPs or UCHs could be 
the cause of the physiological disturbance. It is possible, in particular among 
USPs, that some members of the superfamily are. more specialized for certain 
substrates. An overexpression of a competing, but less efficient "cousin" in 

20 the superfamily could then bring to the same effect as the underfunctioning of 
the specialist USP, which could be cause by a mutation. A mutation causing 
some familial cases of Parkinson's disease (PD) has been found in UCH-L1. 
In this case, the twofold reduction in the enzyme activity (caused by a 
mutation) causes PD. UCH-L1 does not stain AD-NFTs, but it does stain 

25 Lewy bodies in PD, and it can cause familial PD. Therefore, this enzyme may 
not be the primary candidate to cause AD. 

USP-25 is located in a highly methylated chromosomal region, and the 
CpG island that occupies the 5' regulatory sequences and 5'UTR of USP25 
is differentially methylated in a tissue specific fashion. The methylation could 

30 be the key mechanism through which the precise spatial and temporal 
regulation of USP expression in finely regulated. The breakdown in this 
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regulation could be age related. It could occur as a somatic change, only in 
affected neurons. On the other hand it maybe a constitutionally inherited 
allele, if it is true that the majority of SAD are actually unrecognized FAD with 
a late onset. If the theory of neuronal mosaicism for trisomy 21 (Potter) is 
5 proven correct, it would put USP-25 increased gene dose, just as in DS, in 
the driving seat causing the generation of NFTs, with or without a significant 
additive effect of the trisomy of APP. Other events affecting the 5' regulatory 
sequences could play a role: we have identified a (GCC)n repeat in the 5' 
regulatory region of USP25. Such repeats are known to frequently expand in 
10 population, and even somatically between cells one adult organism. Such 
repeat expansions are a cause of many genetic diseases, including myotonic 
dystrophy, fragile X syndrome and Huntington's disease. 

The invention is illustrated further in the accompanying drawings in 

which: 

15 Figure 1 represents a map of human chromosome 21 

Figure 2 represents a map showing the location of exon trapped 
products from the experiments reported below 

Figure 3 represents sequence homologies of USPs 
and 

20 Figure 4 represents the results of the experiment illustrating USP 

properties reported below. 

The following specific description describes the work which has been 
carried out. 



25 Examples 

Tumour Samples 

We identify a portion of human chromosome 21 homozygously deleted 
in non-small cell non carcinoma (NSCLC) for further study. The region 
contained the DNA marker with the highest NSCLC-associated loss of 
30 homozygosity (LOH), reported by Kohno et a/. We found a shared region of 
overlap (SRO) for the hemizygous loss in other NSCLC. The current work is 
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to identify genes in the SRO which have a potential role in tumour 
suppression. 

A total of 42 fresh NSCLC cases have been analyzed from the 
Croatian Tumour Bank (CTB), an initiative with set rules and criteria for 
S accumulation of fresh clinical tumor specimens for molecular studies 

(Spaventi et a/., 1 994). Of these, half (including tumors #47 and #61 ) were 
samples that were recently studied for LOH of the NM23-H1 gene (Bosnar et 
a/., 1997), and the other half were fresh tumors (data obtainable from CTB, 
which also lists the tumor stage in the TNM system, grade, size and survival 
10 data). In each case, the tumors and normal lung tissue specimens (as 
evaluated by the surgeon) were frozen in liquid nitrogen in the operating 
room and further stored at -70°C* Genomic DNA was isolated using standard 
procedures (Sambrook et a/., 1989). For each sample, 4pm serial frozen 
sections were cut, mounted on glass slides, and stained with hematoxylin- 
15 eosin (H&E). A pathologist confirmed the histologic type of the tumor and 
evaluated the percentage of normal cells within the tumor. Only samples with 
less than 20% non-tumor cells were used in this study. 

Markers and LOH Analysis 

Microsatellite analysis was performed using polymerase chain reaction 
20 (PCR) with appropriate primer pairs (sequences and PGR conditions as in 
Genome Data Base, Johns Hopkins University, Baltimore, MD), where the 
forward primer only from each pair was 5* fluorescently labeled with Applied 
Biosystems (ABI; Foster City, CA) Big Dyes™ (6-FAM or HEX). Amplification 
products were analysed using an ABI 310 Genetic Analyzer. Size standards 
25 (GeneScan 350) were mixed with every sample for accurate sizing; the 
separation of the mixture of denatured fragments was achieved by 
electrophoresis through a 47 cm capillary (module GS STR POP4 C) for 
approximately 30 min. Raw data were analyzed using GeneScan and 
Genotyper software. LOH ratios were calculated exactly as described in the 
30 GeneScan Applications manual provided by ABI. For each individual allele's 
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fluorescence level, an average of 3 independent electrophoresis-analysis 
cycles on the ABI 310 was used for calculation. 

Fluorescence In Situ Hybridization (FISH) 

Unstained 4um-thick paraffin block sections were fixed to glass slides, 

5 and a standard pretreatment protocol was followed for formalin-fixed, 

paraffin-embedded slides. P1 -derived artificial chromosome (PAC)DNA was 
labelled with digoxygenin-11-dUTP (Boehringer Mannhein, Germany). 
Approximately 0.5 ug of each labelled PAC DNA sample was mixed with 5 ug 
of Cot1 DNA (Gibco BRL, Gaithersburg, MD), precipitated, denatured, 

10 allowed to preanneal, and then applied to a denatured slide and hybridized 
overnight. Slides were washed and signal detected using anti-digoxygenin- 
rhodamine, followed by DAPI counterstain. Images were captured using a 
Zeiss Axioskop microscope equipped with a charge-coupled device (CCD) 
Photometries, Tucson, AZ) connected to an Apple Powermac 8100 computer. 

15 Images were captured on 3 levels of focus, and each level was examined for 
signals using SmartCapture software (Vysis, Inc., Chicago, IL). Only nuclei 
with signals were counted in each level, and the number of signals in each 
cell was determined. B: FISH using a pool of PACs 90B5, 126N20 and BAC 
391 12 as a probe on the paraffin embedded sections of the tumour #61 . Two 

20 signal nuclei are predominant. C: FISH using a pool of PACs 73M5 and 
1 35E1 4 as a probe on the paraffin embedded sections of the tumour #61 . 
Single signal nuclei are predominant. 
Northern Blot Analysis 

The cDNAs were labelled by random priming and hybridized to human 
25 multiple tissue Northern blots (Clontech, Palo Alto, CA) containing 2 ug polyA 
+ RNA per lane using the protocol recommended by the manufacturer. The 
exposure was for 14 hr to Molecular Dynamics (Sunnyvale, CA) 
Phosphorimager screens. The I.M.A.G.E. Consortium (Lennon etal., 1996) 
cDNA clone ID 824710 and the Unigene clone A002B43 have been used as 
30 labelled probes in separate experiments. 
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Cleavage Analysis of Ubiquitin-Met-p-Galactosidase Fusi n 
Pr tein 

This analysis was performed essentially as described (Everett era/., 
1997). Model fusion protein ubiquitin-Met-B-galactosidase in a pACYC184 
5 (Cm r replicon) was represented by the plasmid pACYC-Ub-Met-B-gal, a kind 
gift of R. Everett. Plasmid pRB105 containing a Saccharomyces cerevisiae 
ubiquitin-specific protease UBP2 in an IPTG-inducible pBR322 (Amp r 
replicon) was a kind gift of R. Baker, and was used as a positive control. The 
new gene USP25 was cloned from nucleotide position 203 to nucleotide 

10 position 3367 (numbering as in GenBank AF 1 3421 3) into SacUSa/l cloning 
sites of the IPTG-inducible Escherichia coli expression vector pQE30 
(Qiagen, Chatsworth, CA). The E. coli XL-1 blue cells were transformed 
using a standard rubidium chloride-heat shock method with the combination 
of pACYC-Ub-Met-p-gal and either pQE30 vector, pQE30-C/SP25, or 

15 pRB105, and each of the 3 cotransormants was selected on medium 

containing chloramphenicol (42 ug/ml) and carbenicillin (75 ug/ml). Western 
blots were prepared by electrotransfer to a nitrocellulose membrane 
(Schleicher & Schull, Keene, NH). The B-galactosidase-containing bands 
were detected by an anti-B-galactosidase polyclonal rabit antiserum (a kind 

20 gift of R. Everett) using an enhanced chemiiuminescence (ECL) assay kit 
(rpn 2132; Amersham, Arlington Heights, IL) under conditions recommended 
by the manufacturer. 
Exon -Trapping 

DNAfrom the 2 PACs (73M5 and 1035 E14) was digested using 
25 BamHI and BgKl to completion or partially using Sau 3AI and exon trapping 
was performed with the resulting fragments in pLSP3B vector using standard 
technology. 

Identification and cloning of USP25 

Twelve sequenced exon-trapped products, when analysed using 
30 BLAST-N against public sequence databases, revealed clusters of 

overlapping cDNA clones. Sequences of our exon-trapped products matched 
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exactly the sequences of the cDNAs forming contigs with a large open 
reading frame (ORF). In three cases (see Fig.2): from EST 824710 to 
AA209364, from AA307805 to AA081200 and from N92952r to Z45010, our 
trapped exon sequences served to bridge the gaps in the gene sequence 
5 using PCR and suitable restriction, ligation and chain extension techniques. 
The combined sequence (sequence ID no.5 and GenBank accession number 
AF134213) revealed a 199 bp 5'UTR, start codon, an ORF of 3165 
nucleotides encoding a protein of 1055 amino acids, a stop codon, a 3'UTR 
of 435 nucleotides and a polyadenylation signal. The total length (without the 
10 polyA) assembled is 3803 nucleotides. On multiple human tissue Northern 
blots (Fig.3) a band of 4.1 kb is visible in all 16 tissues tested (including the 
normal human lung tissue) with a varying intensity. It is most prominent in 
skeletal muscle and testis, and the latter tissue also reveals a prominent 
shorter hybridising transcript of 1.4 kb. All tissues also show a larger weaker 
15 band of 4.9 kb, which could be due to an alternative polyadenylation site. 

In the course of this analysis, the whole genomic sequence of the two 
PACs (73M5 and 135E14) became publicly available by the German Human 
Genome Sequencing Consortium (EMBL accession numbers AJ010597 and 
AJ010598). Comparison of the genomic sequence with the overlapping cDNA 
20 clones and exon sequences revealed that 1 2 out of the 24 exons had been 
exon-trapped (hatched rectangles in Fig.2). It also became apparent that the 
region immediately preceeding the first exon of the gene, comprises the 
known chromosome 21 CpG island at D21S382 (also known as LL56 Not I 
linking clone on the Not I physical map of 21 q, Ichikawa et al., 1993). 
25 When the deduced polypeptide sequence was compared to Swissprot 

and other public databases using BLAST-P, a clear pattern of significant 
homologies (e=10 -6 to 10 -29 )to proteins across the evolutionary spectrum of 
eukaryotes was found (Fig.4): all of these proteins belonged to the 
superfamily of ubiquitin specific proteases (USP-s) or ubiquitin carboxy- 
30 terminal hydrolases (UCH-s) (Baker et al., 1992, Swanson et al., 1996, 

Everett et al., 1997, Wilkinson 1997, Hansen-Hagge et al., 1998, Jensen et 
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al., 1998, Fujiwara et al., 1998, Wilkinson and Hochstrasser 1998, The C. 
elegans Sequencing Consortium 1998). The polypeptide sequences were 
most highly conserved around the three domains (the Cys box, the QQD box 
and the His box, Fig.4) known to be essential for the main function of these 
5 enzymes: the cleavage of ubiquitin at its carboxy terminus from extension 
proteins (ubiquitin precursors) and ubiquitinated proteins and protein 
fragments targetted for the degradation by the 26S proteasome pathway 
(Wilkinson and Hochstrasser 1998). The Cysteine residue at position 178 
and the Histidine residues at positions 599 and 607 (marked with an asterix 

10 in Fig.4), which were shown to be an absolute requirement for the function of 
USP-s and UCH-s (Amerik et al., 1997, Hansen-Hagge et al., 1998, 
Wilkinson and Hochstrasser 1998) were found in the correct positions in the 
sequence of the new gene. The name of the protein USP25 has been 
approved by HUGO Nomenclature Committee. 

15 The novel protein (USP25) cleaves ubiquitin from carboxy- 

terminal fusion proteins 

The ability of USP25 to cleave a model ubiquitin fusion protein 
substrate was investigated by co-expression in £. co//. The complete coding 
sequence of USP25 was cloned into a T5-driven, IPTG inducible expression 

20 vector (pQE30). The new gene USP 25 was cloned from nucleotide position 
203 to nucleotide position 3367 (numbering as in sequence ID no. 5 into Sac 
fSal cloning sites of the IPTG-inducible E.coli expression vector. As a 
positive control, the plasmid pRB105 containing a UBP2 gene encoding a 
S.Cerevisiae ubiquitin specific protease in an IPTG inducible and Amp" 

25 vector was used. The XL-1 blue strain of E. coli was co-transformed with the 
plasmid containing a ubiquitin-Met-B-galactosidase model fusion protein in 
an IPTG-inducible and chloramphenicol resistant vector, in addition to either 
PQE30 vector, pQE30-USP25 or the positive control (pRB105). (Each of the 
3 co-transformants was selected on medium containing chloramphenicol (42 
30 ug/ml) and carbenicillin (75 ug/ml). Co-transformants were grown to 

exponential phase, IPTG induced, and the crude protein extracts from these 
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cultures were analysed by Western blot using an anti (3-galactosidase 
antibody. (The western blots were prepared by electro transfer to a nitro 
cellulose membrane (Schleicher and Schuel.)). The p-galactosidase 
containing bands were detected by an anti-p galactosidase polyclonal rabbit 
5 anti serum using enhanced-chemiluminescence assay kit (ECL, Amersham 
rpn2132) under conditions recommended by the manufacturer. 

As can be seen in Fig.4 t the uncleaved Ub-Met-p-gal substrate (band 
labelled with an asterix in Fig.4, lane 4) converts to an 8 kDa shorter band 
(triangle in Fig.4) in the cells co-transformed with either USP25(lanes 5,6) or 
10 the yeast UBP2 expressing piasmid (lanes 9,10). Constitutive expression of 
USP25 (lane 5) is quite sufficient to cleave to completeness the low levels of 
model substrate. The more prominent and highly induced band migrating 
slightly further in the gel than the de-ubiquitinated cleavage product is the 
truncated form of p-galactosidase expressed by the XL-1 blue bacteria 
15 (compare to lanes 1,2 in Fig.4). This result demonstrates that the novel gene 
product named USP25 can efficiently function as a de-ubiquitinating enzyme. 

From the homologies in the functional domains and from its ability to 
hydrolyse the bond between the C-terminal double glycine of ubiquitin and 
the linking methionine residue (Fig.5), it can be concluded that USP25 is a 
20 member of ubiquitin specific proteases. 

We identify a portion of human chromosome 21 homozygously deleted 
in non-small cell non carcinoma (NSCLC) for further study. The region 
contained the DNA marker with the highest NSCLC-associated loss of 
homozygosity (LOH), reported by Kohno etai We found a shared region of 
25 overlap (SRO) for the hemizygous loss in other NSCLC. The current work is 
to identify genes in the SRO which, whilst having a potential role in tumour 
suppression, may also be associated with AD. 

Determination of Proteins with which USP25 interacts 
Functional analysis of USP25 was performed with the aim of detecting 
30 the cellular proteins which interact with the USP25 protein through protein- 
protein interaction, using Yeast-Two-Hybrid (Y2H) approach. Saccharomyces 
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Cerevisiae yeast has well characterised ubiquitin activating, conjugating and 
ligating ezymatic machinery, capable of ubiquitinating human proteins 
(Scheffner et al. 1998). A cDNA library from human brain cloned in "prey" 
vector, was co-transfected to yeast cells with USP25 cloned in "bait" vector. 
5 Since ubiquitin cleaving activity of USP25 was proven (Groet et al. 2000), 
this technique has a theoretical chance of detecting the natural cellular 
substrates for ubiquitin cleavage and de-ubiquitination by USP25. Since the 
action of ubiquitin cleavage is very rapid (Wilkinson and Hochstrasser 1998), 
the cleavage and dissociation from its natural substrates for fully active 

10 USP25 could preclude the ability to detect the interaction through Y2H. In 
addition, the artificial cross-de-ubiquitination of yeast's own proteins by an 
overexpressed USP25 could theoretically be harmful for the yeast cell and/or 
for the molecular interactions required for the Y2H. For these reasons we 
performed Y2H using USP25-C178A, a site directed mutant we recently 

15 engineered (the mutation being of the key Cys residue in the Cys region) 
which abolishes the capacity for cleavage of ubiquitin by USP25, but should 
not interfere with the binding of USP25 to its natural ubiquitinated substrates, 
since this residue is conserved between all UCH-s and USP-s so far 
identified. 

20 Y2H experiment using USP25-C178A cloned in the yeast two hybrid 

"bait" vector pAS2 (Clontech) co-transformed into yeast cells together with a 
human adult brain cDNA library in the "prey" vector pACT2 (Clontech). 
Interacting events were visualized by the activation of transcription of all 
three reporter genes: Ade2, Mel1 and His3. Interacting "prey" sequences 

25 were verified by PCR-sequencing on the ABI31 0 automated sequencer, 
using universal vector primers, and analysed by BLAST search on non- 
redundant genome and transcriptome sequence databases. The accession 
numbers of the sequences found to be interacting, from the GenBank 
database are given in the table. 
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Table 1. Summary of frequency and identities f specific interacting 
pr teins fr m human brain with USP25-C178A, detected using Yeast- 
Two-Hybrid technique 



Summary of Results by 
decreasing frequency of 
detection of bait 


Number of soecrfic 
independent clones 
"fished" by Y2H 


MtLcssion numoer 


HHRP^A 
nnr\fcwn 


o clones 


D21235 


SUMO-3 


8 clones 


NM 006937 


hiimsn 1 IDPQ 

nurncin udu? 


5 clones 


U66867 


polyUbiquitin 


4 clones 


AB009010 


Ubiquitin 


3 clones 


X04803 j 


Ran BP2 protein 


1 clone 


NM 006267 


Various ubiquitin-like 
precursors (1 or 2 clones 
each): 


4 clones 




Other proteins (1 or 2 
clones each) 


10 clones 





20 hRAD23A (which is a homologue of yeast RAD23 protein) has been 

isolated as a primary interacting protein by the same Y2H technique using 
E6AP as a "bait" (Kumar et al. 1999). E6AP (Human Papilloma Virus E6 
associated protein) functions as one of the two so far detected ubiquitin 
ligases (attaching ubiquitin and labelling for degradation) for the master 

25 tumour suppressor p53 (Scheffner et al. 1 993). The p53 and HHR23A are the 
only two so far proven targets for this ubiquitin ligase (Kumar et al. 1999).. 
Since USP25 shows high rate of target preference for HHR23A (see Table 
1), and both HHR23A and p53 are ubiquitinated by E6AP, it could mean that 
they are both de-ubiquitinated by USP25. The fact that Y2H with USP25 did 

30 not pick up p53 is understandable, because p53 is expressed in small traces 
(very low level) in normal tissues, and gets only accumulated and activated 
following DNA damage or other stimuli for programmed cell death 
(apoptosis). It is possible that the effect of overexpression of USP25 (and 
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other USP-s) may be neuronal toxicity and progressive neuronal cell death 
seen in AD. 

Figure Legends 

Figure 1. Identification of the Shared Region of Overlap (S.R.O.) for 
5 hemizygous deletions in 21 q1 1 -q21 in NSCLC.A: Cytogenetic map, Not I long 
range physical map (Ishikawa, etal., 1993), YAC contig (Nizetic, etal 1994, 
Shinizu et al 1995 and Bosch et al 1996), and bacterial contig, (Groet et al 
1998) are shown in consecutive horizontal layers, respectively, above the 
line showing the markers used in the LOH analysis (oval symbols). Markers 
10 are named as in Genome DataBase (prefix "D21 " omitted). In the column 
under each marker an "X" (symbolizing LOH), "+" (an absence of LOH) and 
"IT (un-informative, homozygous result) for that marker in the set of eleven 
Croatian Tumour Bank (CTB) tumours, or in individual tumours #47 and #61 
are shown. NT=not tested. For comparison with our data, markers used as 
15 probes on the genomic Southern blot of the NSCLC cell line, and/or in LOH 
analysis of fresh tumours in the study by Kohno and co-workers (Kohno et al 
1998) are indicated above the empty bar symbolizing the homozygous 
deletion they found. -In our data, hatched bars indicate hemizygous 
deletions, and black filled bars indicate segments showing absence of LOH 
20 or deletions. Squared symbols "X" and "+" stand for predominantly single 
and predominantly double signal, respectively, detected by FISH on 
interphase nuclei of the paraffin embedded sections of the tumour #61, when 
PAC clones named and indicated as bold lines in the PAC contig above the 
markers line, were used as probes. 
25 Figure 2. Trapped exons (hatched rectangles) and exons deduced 

from overlapping sequence analysis (white rectangles) defining the exon- 
intron structure of the new gene USP25. Top half shows two PACs 73M5 
and 135E14, also used as FISH probes in Fig. 1, which were the source of 
genomic DNA for exon trapping. Exon locations on the PACs are shown with 
30 vertical bars, and the 50 kbp scale bar refers to this part. Bottom half 

consists of overlapping cDNA fragments corresponding to exons above them, 
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drawn in the same scale, (500 bp scale bar is shown). Names of cDNA 
clones are as in dB-EST and UniGene databases, 824710 is the address of 
the clone in the IMAGE Consortium collection. The complete cDNA 
sequence for the whole gene is the new GenBank entry with the accession 
5 number AF 134213. 

Figure 3. Comparison of protein sequences of USP25 to other 
eukaryotic members of the superfamily of USP-s. The protein BAP-1 is 
actually from the family of Ubiquitin C-terminal Hydrolases, a distinct sub- 
family of this superfamily, showing homology only in the single key 

10 aminoacids in the Cys and His domains. Two reports show the localisations 
of the highly homologous sequences for the HAUSP gene to 3p21 (Kashuba, 
era/ 1997) and 16p13 (Robinson, et al 1998), respectively. 

Figure 4. Demonstration of the de-ubiquitinating activity of USP25 on 
a model ubiquitin fusion protein. Western blot of an SDS-PAGE was 

15 detected using an anti-p-galactosidase antiserum. Lanes 1,2: the E. co//XL- 
1 blue cells alone (in all cases second line of the pair is +1PTG). Lanes 3,4: 
same cells co-transfected with the model fusion protein encoding plasmid 
pACYC-UB-Met-p-galactosidase protein, band labelled with an asterix). 
Lanes 5,6: as lanes 3,4 except pQE30-USP25 (full length USP25 gene 
20 cloned in the pQE30 expression vector) was added instead of pQE30. Lanes 
7,8: same as lanes 3,4 except pRB105 (yeast de-ubiquitinating enzyme 
UBP2) was transfected instead of pQE30. Lanes 9,10: over-exposure of 
lanes 7,8. Note the presence of the 8kDa shorter, de-ubiquitinated Met-p- 
galactosidase (band labelled with a triangle). 

25 
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CLAIMS 

1 . Use of ubiquitin specific protease (USP), an inhibitor thereof or the 
gene therefor, or a specific inhibitor or promoter of such a gene, in the 
manufacture of a composition for use in the treatment, diagnosis or 

5 prophylaxis of AD. 

2. Use of ubiquitin-like specific protease an inhibitor thereof or the gene 
therefor, or a specific inhibitor or promoter of such a gene, in the 
manufacture of a composition for use in the treatment, diagnosis or 
prophylaxis of AD. 

10 3. Use according to claim 1 or claim 2 in which the protease is a protein 
having ubiquitin specific protease or ubiquitin-like specific protease activity 
which includes the following three domains having the specified sequences of 
the general formulae I, II and III: 

GXX 1 NX 2 GX 3 TCX 4 X 5 X 6 X 7 X 8 X 9 QX 10 X 11 I 
15 QX 12 DX 13 X 14 EX 15 X 16 X 17 X ie X 19 X 20 X 21 X 22 X 23 X 24 X 25 II 

YX 26 LX 27 X 28 X 29 X^ III 
in which 

X is the residue of a non-polar amino acid 

X I is the residue of an amino acid with an uncharged or basic R group 
20 X 2 is the residue of an amino acid with an uncharged R group 

X 3 is the residue of an amino acid with an uncharged R group 

X 4 is the residue of an amino acid having a relatively large uncharged R 

group 

X 5 is the residue of an amino acid having an uncharged R group 
25 X 6 is the residue of an amino acid having a relatively polar uncharged R 
group 

X 7 is the residue of an amino acid having an uncharged R group 
X 8 is the residue of an amino acid having a non-polar R group 
X 9 is the residue of an amino acid having an uncharged R group 
30 X 10 is the residue of an amino acid having an uncharged R group 

X II Is the residue of an amino acid having an uncharged R group 
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X 12 is Q or H 

X 13 is the residue of an amino acid having a polar R group 
X 14 is the residue of an amino acid having an uncharged or basic R group 
X 15 is the residue of an amino acid having an uncharged R group 
5 X 16 is the residue of an amino acid having an uncharged R group 

X 17 is the residue of an amino acid having an uncharged polar R group or a 
basic R group 

X 18 is the residue of an amino acid having an uncharged or a basic R group 
X 19 is the residue of an amino acid having a polar R group 
10 X 20 is the residue of an amino acid having an uncharged R group 

X 21 is the residue of an amino acid having an uncharged polar R group or an 
acidic R group 

X 22 is the residue of an amino acid having an uncharged polar R group or a 
basic R group 

15 X 23 is the residue of an amino acid having a non-polar R group 

X 24 is the reside of an amino acid having an uncharged polar R group or an 
acidic R group 

X 25 is the residue of any amino acid 
X 26 is the residue of any amino acid 

20 X 27 is the residue of an uncharged or a basic amino acid 

X 28 is the residue of an amino acid having a non-polar R group 
X 29 is the residue of an amino acid having a non-polar R group 
X 30 is the residue of an amino acid having a non-polar R group 
X 31 is the residue of an amino acid having an uncharged R group 

25 X 32 is the residue of an amino acid having an uncharged polar R group or an 
acidic R group 

X 33 is the residue of an amino acid having an uncharged or an acidic R group 
X 34 is the residue of an amino acid having an uncharged R group 
X 35 is residue of an amino acid having an uncharged or a basic R group 
30 X 36 is the residue of an amino acid having an uncharged or a basic R group 
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X 37 is the a bond or the residue of an amino acid having an uncharged polar 

R group or a basic R group, or is a bond 

X M is the residue of an amino acid having an uncharged R group 

X 39 is the residue of an amino acid having an uncharged R group and 

X 40 is the residue of an amino acid having an uncharged R group. 

4. Use according to claim 3 in which: 

X is F or L; 



30 



X 1 

X 2 

X 3 

X 4 

X s 

X 6 

X 7 

X 8 

X 9 
x io 

X 11 
X 12 
X 13 
X 15 
X 19 
X 20 
X 21 
X 22 
X 23 
X 24 
X 23 
X 27 
X 28 
X 29 



s K or R; 

s selected from V, S, U, A, L and F; 
sAor N; 
sYor W; 

s selected from F, M, L and C; 
s N or S; 

s selected from A, C and S; 
s selected from V, L and I; 
s selected from V, L, I and S; 
s selected from S, V, Q, A and T; 
s Y or L; 
sQ; 

s V or A; 

s selected from F, L, M and V; 

s selected from L, I and V; 

sL; 

sD; 

sW; 

s selected from L, I and V; 
s selected from E, Q, D and A; 
sD; 

s selected from H, Y, V and F; 
s A or G; 

s selected from L, I and V; 
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X 30 


is selected from L, 1 and V; 


X 31 


is selected from L, 1, V and M; 


X 32 


is sei cted from E, S and C; 


X 33 


is selected from D, E, Q and S; 


5 X 34 


is selected from L, 1, V, A and G; 


X 35 


is selected from N, H, S and P; 


X 37 


is a bond; 


X 38 


is selected from Y, W, T, 1 and V; 


X 39 


is selected from A, S, T and V; and 


10 X 40 


is selected from Y, L, S and 1. 



5. Use according to claim 3 or claim 4 in which: 
X 14 is selected from S, T, Q, L and H; 

X 16 is selected from T, L, N, F and C; 
X 17 is selected from H. T, R t N and Q; 
15 X 18 is selected from K. L, I, V, S, C and Y; 

X 26 is selected from R, E, V, N, G, D and I; and 
X 36 is selected from A, R, N, T, G and S. 

6. Use according to claim 3 in which the three domains have the 
sequences I.D. Nos 2-4, respectively. 

20 7. Use according to any preceding claim in which the protease has 
sequence ID 1, or is a fragment thereof with ubiquitin specific protease or 
ubiquitin-like specific protease activity, or a homologue from a non-human 
animal. 

8. Use according to any of claims 1 to 6 in which the protease has 
25 sequence ID 5. 

9. Use according to any of claims 1 to 6 in which the gene is used and 
the gene comprises sequence ID No. 6. 

10. Use according to any of claims 1 to 6 in which the gene is used and 
the gene comprises sequence ID No 7. 

30 11. An in vitro method of diagnosis of AD in a human or animal in which a 
ubiquitin specific protease, or the gene therefor is detected in a sample taken 
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from a patient suspected of suffering from AD, wherein an alteration in the 
level of expression of protease or in the sequence of the translated protease 
as compared to normal levels is detected. 

12. An in vitro method of diagnosis of AD in a human or animal in which a 
S ubiquitin-like specific protease, or the gene therefor is detected in a sample 

taken from a patient suspected of suffering from AD, wherein an alteration in 
the level of expression of protease or in the sequence of the translated 
protease as compared to normal levels is detected 

13. A method for investigating the pathogenesis of AD by the use of a 
10 USP, or the gene therefore. 

14. A method for investigating the pathogenesis of AD by the use of a 
ubiquitin-like specific protease. 

15. A method according to any of claims 11 to 14 in which the protease is 
as defined in any of claims 3 to 8. 

15 16. A method of synthesizing a protease having Cys, QQD and His 
domains specified in sequence ID'S Nos. 2, 3 and 4, respectively in which 
nucleic acid encoding the protease is introduced into a microorganism or a 
cell-line in a form in which it can be replicated, transcribed and translated, 
and the microorganism or cell-line is cultured under conditions whereby the 

20 nucleic acid is replicated, transcribed and translated to form the protease, 
and the protease is collected. 

17. A method according to claim 16 in which the nucleic acid introduced 
into the microorganism, or cell-line, as the case may be, includes the 
sequence represented by ID. No. 6 or No. 7. 
25 18. A nucleic acid construct comprising a USP gene encoding a protein 
product having Cys, QQD and His regions as specified in sequence ID'S Nos. 
2, 3 and 4 and having an origin of replication. 

19. A construct according to claim 1 8 formed from a vector having an 
origin of replication and preferably also a promoter for initiation of translation. 
30 20. A construct according to claim 1 8 or 1 9 in which the gene has 
sequence ID. No. 6 or No. 7. 
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21 . A nucleic acid construct encoding a product having a contiguous 
sequence of at least 10 residues having at least 50% identity with a 
contiguous sequence of residues of sequence ID 1 , and an origin of 
replication. 

5 22. A construct according to claim 21 in which the product has a 

contiguous sequence of at least 20, for instance 50 or more, residues having 
at least 20%, preferably at least 50%, identity with a contiguous sequence of 
residues of sequence ID 1 . 

23. A construct according to claim 21 or 22 in which the product has two 
10 or more, for instance three or four, contiguous sequences of at least 10 bases 

having at least 5% identity with respective contiguous sequences of sequence 
ID 1. 

24. A transformed microorganism containing the construct of any of claims 
18 to 23. 

15 25. A transfected cell-line containing the construct of any of claims 1 8 to 
23. 

26. A cell-line according to claim 25 which is derived from neuroblasts, 
neuroepithelial cells or neurones. 

27. USP produced by a method according to claim 16 or 1 7. 

20 28. A protein product which is not a functional protease which has a 

contiguous sequence of at least 10 residues having at least 50% identity with 
a contiguous sequence of sequence ID 1 . 

29. A protein product according to claim 28 which has a contiguous 
sequence of at least 20 residues, for instance 50 or more, having at least 

25 20%, preferably at least 50%, identity with a contiguous sequence of 
sequence ID 1. 

30. A protein product according to claim 28 or 29 which has contiguous 
sequences which differ from those of sequence ID'S 2, 3 and/or 4 in respect of 
one, two or three residues, preferably including the Cys residue which is the 

30 9th residue of sequence ID 2, and/or one or more of the QQD residues which 
are the 1st to 3rd residues of sequence ID 3, and/or the His residue which is 
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the 17th residue of sequence ID 3, most preferably the Cys residue of 
sequence ID 3. 

31 . A product according to any of claims 28 to 30 which is a competitive 
inhibitor of ubiquitin specific protease or ubiquitin-like specific protease 

5 activity. 

32. A protein having sequence ID 1 in substantially isolated form. 

33. A substantially isolated protein which is characterised by its ability to 
interact with one or more and preferably all those of the proteins having the 
sequences of GenBank Accession Nos. D21235 (HHR23A), NM 006937 

10 (SUMO-3), U 66867 (UBC9), AB 009010 (poly Ubiquitin) and X 04803 
(Ubiquitin), as determined using the encoding gene in a yeast-two-hybrid 
method. 

34. A non-human animal model useful to investigate the mechanism of AD 
involving USP t having the gene for USP homologous to that found at human 

15 chromosome 21 q1 1-21 knocked out, hemi- or homozygously, is polysomic for 
that gene, or has that gene modified by disrupting the transcriptional or 
translational control whereby the animal has reduced or increased levels of 
the corresponding USP. 

35. A non-human animal according to claim 34 which is a mouse. 
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Figure 2 
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Figure 3 
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SEQUENCE LISTING 

<110> School of Pharmacy 

<120> Diagnosis and Treatment of Alzheimer's Disease 

<130> HMJ03220WO 

<140> PCT/GB 0002423 
<141> 2000-06-22 

<150> 9914589.8 
<151> 1999-06-22 

<150> 0008162.0 
<151> 2000-04-03 

<160> 7 

<170> Patentln Ver. 2.1 

<210> 1 
<211> 1055 
<212> PRT 

<213> Homo sapiens 
<400> 1 

Met Thr Val Glu Gin Asn Val Leu Gin Gin Ser Ala Ala Gin Lys His 
15 10 15 

Gin Gin Thr Phe Leu Asn Gin Leu Arg Glu lie Thr Gly lie Asn Asp 
20 25 30 

Thr Gin lie Leu Gin Gin Ala Leu Lys Asp Ser Asn Gly Asn Leu Glu 
35 40 45 

Leu Ala Val Ala Phe Leu Thr Ala Lys Asn Ala Lys Thr Pro Gin Gin 
50 55 60 

Glu Glu Thr Thr Tyr Tyr Gin Thr Ala Leu Pro Gly Asn Asp Arg Tyr 
65 70 75 80 

lie Ser Val Gly Ser Gin Ala Asp Thr Asn Val lie Asp Leu Thr Gly 
85 90 95 

Asp Asp Lys Asp Asp Leu Gin Arg Ala He Ala Leu Ser Leu /Via Glu 
100 105 no 
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Ser Asn Arg Ala Phe Arg Glu Thr Gly lie Thr Asp Glu Glu Gin Ala 
115 120 125 

He Ser Arg Val Leu Glu Ala Ser He Ala Glu Asn Lys Ala Cys Leu 
130 135 140 

Lys Arg Thr Pro Thr Glu Val Trp Arg Asp Ser Arg Asn Pro Tyr Asp 
145 150 155 160 

Arg Lys Arg Gin Asp Lys Ala Pro Val Gly Leu Lys Asn Val Gly Asn 
165 170 175 

Thr Cys Trp Phe Ser Ala Val He Gin Ser Leu Phe Asn Leu Leu Glu 
180 185 190 

Phe Arg Arg Leu Val Leu Asn Tyr Lys Pro Pro Ser Asn Ala Gin Asp 
195 200 205 

Leu Pro Arg Asn Gin Lys Glu His Arg Asn Leu Pro Phe Met Arg Glu 
210 215 220 

Leu Arg Tyr Leu Phe Ala Leu Leu Val Gly Thr Lys Arg Lys Tyr Val 
225 230 235 240 

Asp Pro Ser Arg Ala Val Glu He Leu Lys Asp Ala Phe Lys Ser Asn 
245 250 255 

Asp Ser Gin Gin Gin Asp Val Ser Glu Phe Thr His Lys Leu Leu Asp 
260 265 270 

Trp Leu Glu Asp Ala Phe Gin Met Lys Ala Glu Glu Glu Thr Asp Glu 
275 280 285 

Glu Lys Pro Lys Asn Pro Met Val Glu Leu Phe Tyr Gly Arg Phe Leu 
290 295 300 

Ala Val Gly Val Leu Glu Gly Lys Lys Phe Glu Asn Thr Glu Met Phe 
305 310 315 320 

Gly Gin Tyr Pro Leu Gin Val Asn Gly Phe Lys Asp Leu His Glu Cys 
325 330 335 

Leu Glu Ala Ala Met He Glu Gly Glu He Glu Ser Leu His Ser Glu 
340 345 350 

Asn Ser Gly Lys Ser Gly Gin Glu His Trp Phe Thr Glu Leu Pro Pro 
355 360 365 
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Val Leu Thr Phe Glu Leu Ser Arg Phe Glu Phe Asn Gin Ala Leu Gly 
370 375 380 

Arg Pro Glu Lys lie His Asn Lys Leu Glu Phe Pro Gin Val Leu Tyr 
385 390 395 400 

Leu Asp Arg Tyr Met His Arg Asn Arg Glu lie Thr Arg lie Lys Arg 
405 410 415 

Glu Glu lie Lys Arg Leu Lys Asp Tyr Leu Thr Val Leu Gin Gin Arg 
420 425 430 

Leu Glu Arg Tyr Leu Ser Tyr Gly Ser Gly Pro Lys Arg Phe Pro Leu 
435 440 445 

Val Asp Val Leu Gin Tyr Ala Leu Glu Phe Ala Ser Ser Lys Pro Val 
450 455 460 

Cys Thr Ser Pro Val Asp Asp lie Asp Ala Ser Ser Pro Pro Ser Gly 
465 470 475 480 

Ser lie Pro Ser Gin Thr Leu Pro Ser Thr Thr Glu Gin Gin Gly Ala 
485 490 495 

Leu Ser Ser Glu Leu Pro Ser Thr Ser Pro Ser Ser Val Ala Ala lie 
500 505 510 

Ser Ser Arg Ser Val He His Lys Pro Phe Thr Gin Ser Arg He Pro 
515 520 525 

Pro Asp Leu Pro Met His Pro Ala Pro Arg His He Thr Glu Glu Glu 
530 535 540 

Leu Ser Val Leu Glu Ser Cys Leu His Arg Trp Arg Thr Glu He Glu 
545 550 555 560 

Asn Asp Thr Arg Asp Leu Gin Glu Ser He Ser Arg He His Arg Thr 
565 570 575 

He Glu Leu Met Tyr Ser Asp Lys Ser Met lie Gin Val Pro Tyr Arg 
580 585 590 

Leu His Ala Val Leu Val His Glu Gly Gin Ala Asn Ala Gly His Tyr 
595 600 605 



Trp Ala Tyr lie Phe Asp His Arg Glu Ser Arg Trp Met Lys Tyr Asn 
610 615 620 
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Asp He Ala Val Thr Lys Ser Ser Trp Glu Glu Leu Val Arg Asp Ser 
625 630 635 640 

Phe Gly Gly Tyr Arg Asn Ala Ser Ala Tyr Cys Leu Met Tyr He Asn 
645 650 655 

Asp Lys Ala Gin Phe Leu He Gin Glu Glu Phe Asn Lys Glu Thr Gly 
660 665 670 

Gin Pro Leu Val Gly He Glu Thr Leu Pro Pro Asp Leu Arg Asp Phe 
675 680 685 

Val Glu Glu Asp Asn Gin Arg Phe Glu Lys Glu Leu Glu Glu Trp Asp 
690 695 700 

Ala Gin Leu Ala Gin Lys Ala Leu Gin Glu Lys Leu Leu Ala Ser Gin 
705 710 715 720 

Lys Leu Arg Glu Ser Glu Thr Ser Val Thr Thr Ala Gin Ala Ala Gly 
725 730 735 

Asp Pro Glu Tyr Leu Glu Gin Pro Ser Arg Ser Asp Phe Ser Lys His 
740 745 750 

Leu Lys Glu Glu Thr He Gin He He Thr Lys Ala Ser His Glu His 
755 760 765 

Glu Asp Lys Ser Pro Glu Thr Val Leu Gin Ser Ala He Lys Leu Glu 
770 775 780 

Tyr Ala Arg Leu Val Lys Leu Ala Gin Glu Asp Thr Pro Pro Glu Thr 
785 790 795 800 

Asp Tyr Arg Leu His His Val Val Val Tyr Phe He Gin Asn Gin Ala 
805 810 815 

Pro Lys Lys He He Glu Lys Thr Leu Leu Glu Gin Phe Gly Asp Arg 
820 825 830 

Asn Leu Ser Phe Asp Glu Arg Cys His Asn He Met Lys Val Ala Gin 
835 840 845 



Ala Lys Leu Glu Met He Lys Pro Glu Glu Val Asn Leu Glu Glu Tyr 
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Leu lie lie Gly Leu Glu Asn Phe Gin Arg Glu Ser Tyr lie Asp Ser 
885 890 895 

Leu Leu Phe Leu lie Cys Ala Tyr Gin Asn Asn Lys Glu Leu Leu Ser 
900 905 910 

Lys Gly Leu Tyr Arg Gly His Asp Glu Glu Leu He Ser His Tyr Arg 
915 920 925 

Arg Glu Cys Leu Leu Lys Leu Asn Glu Gin Ala Ala Glu Leu Phe Glu 
930 935 940 

Ser Gly Glu Asp Arg Glu Val Asn Asn Gly Leu He He Met Asn Glu 
945 950 955 960 

Phe He Val Pro Phe Leu Pro Leu Leu Leu Val Asp Glu Met Glu Glu 
965 970 975 

Lys Asp He Leu Ala Val Glu Asp Met Arg Asn Arg Trp Cys Ser Tyr 
980 985 990 

Leu Gly Gin Glu Met Glu Pro His Leu Gin Glu Lys Leu Thr Asp Phe 
995 1000 1005 

Leu Pro Lys Leu Leu Asp Cys Ser Met Glu He Lys Ser Phe His Glu 
1010 1015 1020 

Pro Pro Lys Leu Pro Ser Tyr Ser Thr His Glu Leu Cys Glu Arg Phe 
1025 1030 1035 1040 

Ala Arg He Met Leu Ser Leu Ser Arg Thr Pro Ala Asp Gly Arg 
1045 1050 1055 



<210> 2 
<211> 18 
<212> PRT 

<213> Homo sapiens 
<400> 2 

Gly Leu Lys Asn Val Gly Asn Thr Cys Trp Phe Ser Ala Val lie Gin 
1 5 io 15 

Ser Leu 
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<210> 3 
<211> 17 
<212> PRT 

<213> Homo sapiens 
<40O> 3 

Gin Gin Asp Val Ser Glu Phe Thr His Lys Leu Leu Asp Trp Leu Glu 
15 10 15 

Asp 



<210> 4 
<211> 21 
<212> PRT 

<213> Homo sapiens 
<400> 4 

Tyr Arg Leu His Ala Val Leu Val His Glu Gly Gin Ala Asn Ala Gly 
1 5 10 15 

His Tyr Trp Ala Tyr 
20 



<210> 5 
<211> 1087 
<212> PRT 

<213> Homo sapiens 
<400> 5 

Met Thr Val Glu Gin Asn Val Leu Gin Gin Ser Ala Ala Gin Lys His 
1 5 10 15 

Gin Gin Thr Phe Leu Asn Gin Leu Arg Glu lie Thr Gly lie Asn Asp 
20 25 30 

Thr Gin lie Leu Gin Gin Ala Leu Lys Asp Ser Asn Gly Asn Leu Glu 
35 40 45 

Leu Ala Val Ala Phe Leu Thr Ala Lys Asn Ala Lys Thr Pro Gin Gin 
50 55 60 

Glu Glu Thr Thr Tyr Tyr Gin Thr Ala Leu Pro Gly Asn Asp Arg Tyr 

6 
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lie Ser Val Gly Ser Gin Ala Asp Thr Asn Val He Asp Leu Thr Gly 
85 90 95 

Asp Asp Lys Asp Asp Leu Gin Arg Ala He Ala Leu Ser Leu Ala Glu 
100 105 110 

Ser Asn Arg Ala Phe Arg Glu Thr Gly He Thr Asp Glu Glu Gin Ala 
115 120 125 

He Ser Arg Val Leu Glu Ala Ser He Ala Glu Asn Lys Ala Cys Leu 
130 135 140 

Lys Arg Thr Pro Thr Glu Val Trp Arg Asp Ser Arg Asn Pro Tyr Asp 
145 150 155 160 

Arg Lys Arg Gin Asp Lys Ala Pro Val Gly Leu Lys Asn Val Gly Asn 
165 170 175 

Thr Cys Trp Phe Ser Ala Val He Gin Ser Leu Phe Asn Leu Leu Glu 
180 185 190 

Phe Arg Arg Leu Val Leu Asn Tyr Lys Pro Pro Ser Asn Ala Gin Asp 
195 200 205 

Leu Pro Arg Asn Gin Lys Glu His Arg Asn Leu Pro Phe Met Arg Glu 
210 215 220 

Leu Arg Tyr Leu Phe Ala Leu Leu Val Gly Thr Lys Arg Lys Tyr Val 
225 230 235 240 

Asp Pro Ser Arg Ala Val Glu He Leu Lys Asp Ala Phe Lys Ser Asn 
245 250 255 

Asp Ser Gin Gin Gin Asp Val Ser Glu Phe Thr His Lys Leu Leu Asp 
260 265 270 

Trp Leu Glu Asp Ala Phe Gin Met Lys Ala Glu Glu Glu Thr Asp Glu 
275 280 285 

Glu Lys Pro Lys Asn Pro Met Val Glu Leu Phe Tyr Gly Arg Phe Leu 
290 295 300 

Ala Val Gly Val Leu Glu Gly Lys Lys Phe Glu Asn Thr Glu Met Phe 
305 310 315 320 

Gly Gin Tyr Pro Leu Gin Val Asn Gly Phe Lys Asp Leu His Glu Cys 
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Leu Glu Ala Ala Met He Glu Gly Glu He Glu Ser Leu His Ser Glu 
340 345 350 

Asn Ser Gly Lys Ser Gly Gin Glu His Trp Phe Thr Glu Leu Pro Pro 
355 360 365 

Val Leu Thr Phe Glu Leu Ser Arg Phe Glu Phe Asn Gin Ala Leu Gly 
370 375 380 

Arg Pro Glu Lys He His Asn Lys Leu Glu Phe Pro Gin Val Leu Tyr 
385 390 395 4 00 

Leu Asp Arg Tyr Met His Arg Asn Arg Glu He Thr Arg He Lys Arg 
405 410 415 

Glu Glu He Lys Arg Leu Lys Asp Tyr Leu Thr Val Leu Gin Gin Arg 
420 425 430 

Leu Glu Arg Tyr Leu Ser Tyr Gly Ser Gly Pro Lys Arg Phe Pro Leu 
435 440 445 

Val Asp Val Leu Gin Tyr Ala Leu Glu Phe Ala Ser Ser Lys Pro Val 
450 455 460 

Cys Thr Ser Pro Val Asp Asp He Asp Ala Ser Ser Pro Pro Ser Gly 
465 470 475 480 

Ser He Pro Ser Gin Thr Leu Pro Ser Thr Thr Glu Gin Gin Gly Ala 
485 490 495 

Leu Ser Ser Glu Leu Pro Ser Thr Ser Pro Ser Ser Val Ala Ala He 
500 505 510 

Ser Ser Arg Ser Val He His Lys Pro Phe Thr Gin Ser Arg He Pro 
515 520 525 

Pro Asp Leu Pro Met His Pro Ala Pro Arg His He Thr Glu Glu Lys 
530 535 540 

Leu Ser Val Leu Glu Ser Cys Leu His Arg Trp Arg Thr Glu He Glu 
545 550 555 560 

Asn Asp Thr Arg Asp Leu Gin Glu Ser He Ser Arg He His Arg Thr 
565 570 575 

He Glu Leu Met Tyr Ser Asp Lys Ser Met He Gin Val Pro Tyr Arg 

8 
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580 585 590 

Leu His Ala Val Leu Val His Glu Gly Gin Ala Asn Ala Gly His Tyr 
595 600 605 

Trp Ala Tyr He Phe Asp His Arg Glu Ser Arg Trp Met Lys Tyr Asn 
610 615 620 

Asp He Ala Val Thr Lys Ser Ser Trp Glu Glu Leu Val Arg Asp Ser 
"5 630 635 6 40 

Phe Gly Gly Tyr Arg Asn Ala Ser Ala Tyr Cys Leu Met Tyr He Asn 
645 650 655 

Asp Lys Ala Gin Phe Leu He Gin Glu Glu Phe Asn Lys Glu Thr Gly 
660 665 670 

Gin Pro Leu Val Gly He Glu Thr Leu Pro Pro Asp Leu Arg Asp Phe 
675 680 685 

Val Glu Glu Asp Asn Gin Arg Phe Glu Lys Glu Leu Glu Glu Trp Asp 
690 695 700 

Ala Gin Leu Ala Gin Lys Ala Leu Gin Glu Lys Leu Leu Ala Ser Gin 
705 710 715 720 

Lys Leu Arg Glu Ser Glu Thr Ser Val Thr Thr Ala Gin Ala Ala Gly 
725 730 735 

Asp Pro Glu Tyr Leu Glu Gin Pro Ser Arg Ser Asp Phe Ser Lys His 
7 40 745 750 

Leu Lys Glu Glu Thr He Gin He He Thr Lys Ala Ser His Glu His 
755 760 765 

Glu Asp Lys Ser Pro Glu Thr Val Leu Gin Ser He Met Met Thr Pro 
770 775 780 

Asn Met Gin Gly He He Met Ala He Gly Lys Ser Arg Ser Val Tyr 
785 790 795 800 

Asp Arg Cys Gly Pro Glu Ala Gly Phe Phe Lys Ala He Lys Leu Glu 
805 810 815 

Tyr Ala Arg Leu Val Lys Leu Ala Gin Glu Asp Thr Pro Pro Glu Thr 
920 825 830 

Asp Tyr Arg Leu His His Val Val Val Tyr Phe He Gin Asn Gin Ala 
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Pro Lys Lys lie lie Glu Lys Thr Leu Leu Glu Gin Phe Gly Asp Arg 
850 855 860 

Asn Leu Ser Phe Asp Glu Arg Cys His Asn lie Met Lys Val Ala Gin 
865 870 875 880 

Ala Lys Leu Glu Met lie Lys Pro Glu Glu Val Asn Leu Glu Glu Tyr 
885 890 895 

Glu Glu Trp His Gin Asp Tyr Arg Lys Phe Arg Glu Thr Thr Met Tyr 
900 905 910 

Leu He He Gly Leu Glu Asn Phe Gin Arg Glu Ser Tyr He Asp Ser 
915 920 925 

Leu Leu Phe Leu He Cys Ala Tyr Gin Asn Asn Lys Glu Leu Leu Ser 
930 935 940 

Lys Gly Leu Tyr Arg Gly His Asp Glu Glu Leu He Ser His Tyr Arg 
945 950 955 960 

Arg Glu Cys Leu Leu Lys Leu Asn Glu Gin Ala Ala Glu Leu Phe Glu 
965 970 975 

Ser Gly Glu Asp Arg Glu Val Asn Asn Gly Leu He He Met Asn Glu 
980 « 985 990 

Phe He Val Pro Phe Leu Pro Leu Leu Leu Val Asp Glu Met Glu Glu 
995 1000 1005 

Lys Asp He Leu Ala Val Glu Asp Met Arg Asn Arg Trp Cys Ser Tyr 
1010 1015 1020 

Leu Gly Gin Glu Met Glu Pro His Leu Gin Glu Lys Leu Thr Asp Phe 
1025 1030 1035 1040 

Leu Pro Lys Leu Leu Asp Cys Ser Met Glu He Lys Ser Phe His Glu 
1045 1050 1055 

Pro Pro Lys Leu Pro Ser Tyr Ser Thr His Glu Leu Cys Glu Arg Phe 
1060 1065 1070 

Ala Arg He Met Leu Ser Leu Ser Arg Thr Pro Ala Asp Gly Arg 
1075 1080 1085 
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<210> 6 

<211> 3803 

<212> DNA 

<213> Homo sapiens 



<400> 6 

acagtcggcg tttcgccgcc tgcccgcggt gcccgcgcac gccggccgcc atcgccttcg 60 
cgcctggctg gcgggggcgc tgtcctccca ggccgtccgc gccgctccct ggagctcggc 120 
ggagcgcggc agccagggcc ggcggaggcg cgaggagccg ggcgccaccg ccgccgccgc 180 
cgccgccgcc gcgggggcca tgaccgtgga gcagaacgtg ctgcagcaga gcgcggcgca 240 
gaagcaccag cagacgtttt tgaatcaact gagagaaatt acggggatta atgacaccca 300 
gatactacag caagccttga aggatagtaa tggaaacttg gaattagcag tggctttcct 360 
tactgcgaag aatgctaaga cccctcagca ggaggagaca acttactacc aaacagcact 420 
tcctggcaat gatagataca tcagtgtggg aagccaagca gatacaaatg tgattoatct 480 
cactggagat gataaagatg atcttcagag agcaattgcc ttgagtttgg ccgaatcaaa 540 
cagggcattc agggagactg gaataactga tgaggaacaa gccattagca gagttcttga 600 
agccagcata gcagagaata aagcatgttt gaagaggaca cctacagaag tttggaggga 660 
ttctcgaaac ccttatgata gaaaaagaca ggacaaagct cccgttgggc taaagaatgt 720 
tggcaatact tgttggttta gtgctgttat tcagtcatta tttaatcttt tggaatttag 780 
aagattagtt ctgaattaca agcctccatc aaatgctcaa gatttacccc gaaaccaaaa 840 
ggaacatcgg aatttgcctt ttatgcgtga gctgaggtat ctatttgcac ttcttgttgg 900 
taccaaaagg aagtatgttg atccatcaag agcagttgaa attcttaagg atgctttcaa 960 
atcaaatgac tcacagcagc aagatgtgag tgagtttaca cacaaattat tagattggtt 1020 
agaagatgcc ttccaaatga aagctgaaga ggagacggat gaagagaagc caaagaaccc 1080 
catggtagag ttgttctatg gcagattcct ggctgtggga gtacttgaag gtaaaaaatt 1140 
tgaaaacact gaaatgtttg gtcagtaccc acttcaggtc aatgggttca aagatctgca 1200 
tgagtgccta gaagctgcaa tgattgaagg agaaattgag tctttacatt cagagaattc 1260 
aggaaaatca ggccaagagc attggtttac tgaattacca cctgtgttaa catttgaatt 1320 
gtcaagattt gaatttaatc aggcattggg aagaccagaa aaaattcaca acaaattaga 1380 
atttccccaa gttttatatt tggacagata catgcacaga aacagagaaa taacaagaat 1440 
taagagggaa gagatcaaga gactgaaaga ttacctcacg gtattacaac aaaggctaga 1500 
aagatattta agctatggtt ccggtcccaa acgattcccc ttggtagatg ttcttcagta 1560 
tgcattggaa tttgcctcaa gtaaacctgt ttgcacttct cctgttgacg atattgacgc 1620 
tagttcccca cctagtggtt ccataccatc acagacatta ccaagcacaa cagaacaaca 1680 
gggagcccta tcttcagaac tgccaagcac atcaccttca tcagttgctg ccatttcatc 1740 
gagatcagta atacacaaac catttactca gtcccggata cctccagatt tgcccatgca 1800 
tccggcacca aggcacataa cggaggaaga actttctgtg ctggaaagtt gtttacatcg 1860 
ctggaggaca gaaatagaaa atgacaccag agatttgcag gaaagcatat ccagaatcca 1920 
tcgaacaatt gaattaatgt actctgacaa atctatgata caagttcctt atcgattaca 1980 
tgccgtttta gttcacgaag gccaagctaa tgctgggcac tactgggcat atatttttga 2040 
tcatcgtgaa agcagatgga tgaagtacaa tgatattgct gtgacaaaat catcatggga 2100 
agagctagtg agggactctt ttggtggtta tagaaatgcc agtgcatact gtttaatgta 2160 
cataaatgat aaggcacagt tcctaataca agaggagttt aataaagaaa ctgggcagcc 2220 
ccttgttggt atagaaacat taccaccgga tttgagagat tttgttgagg aagacaacca 2280 
acgatttgaa aaagaactag aagaatggga tgcacaactt gcccagaaag ctttgcagga 2340 
aaagctttta gcgtctcaga aattgagaga gtcagagact tctgtgacaa cagcacaagc 2400 
agcaggagac ccagaatatc tagagcagcc atcaagaagt gatttctcaa agcacttgaa 2460 
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agaagaaact attcaaataa ttaccaaggc atcacatgag catgaagata aaagtcctga 2520 
aacagttttg cagtcggcaa ttaagttgga atatgcaagg ttggttaagt tggcccaaga 2580 
agacacccca ccagaaaccg attatcgttt acatcatgta gtggtctact ttatccagaa 2640 
ccaggcacca aagaaaatta ttgagaaaac attactagaa caatttggag atagaaattt 2700 
gagttttgat gaaaggtgtc acaacataat gaaagttgct caagccaaac tggaaatgat 2760 
aaaacctgaa gaagtaaact tggaggaata tgaggagtgg catcaggatt ataggaaatt 2820 
cagggaaaca actatgtatc tcataattgg gctagaaaat tttcaaagag aaagttatat 2880 
agattccttg ctgttcctca tctgtgctta tcagaataac aaagaactct tgtctaaagg 2940 
cttatacaga ggacatgatg aagaattgat atcacattat agaagagaat gtttgctaaa 3000 
attaaatgag caagccgcag aactcttcga atctggagag gatcgagaag taaacaatgg 3060 
tttgattatc atgaatgagt ttattgtccc atttttgcca ttattactgg tggatgaaat 3120 
ggaagaaaag gatatactag ctgtagaaga tatgagaaat cgatggtgtt cctaccttgg 3180 
tcaagaaatg gaaccacacc tccaagaaaa gctgacagat tttttgccaa aactgcttga 3240 
ttgttctatg gagattaaaa gtttccatga gccaccgaag ttaccttcat attccacgca 3300 
tgaactctgt gagcgatttg cccgaatcat gttgtccctc agtcgaactc ctgctgatgg 3360 
aagataaact gcacactttc cctgaacaca ctgtataaac tctttttagt tcttaaccct 3420 
tgccttcctg tcacagggtt tgcttgttgc tgctatagtt tttaactttt ttttatttta 3480 
ataactgcaa aagacaaaat gactatacag actttagtca gactgcagac aataaagctg 3540 
aaaatcgcat ggcgctcaga cattttaacc ggaactgatg tataatcaca aatctaattg 3600 
attttattat ggcaaaacta tgcttttgcc accttcctgt tgcagtatta ctttgctttt 3660 
atcttttctt tctcaacagc tttccattca gtctggatcc ttccatgact acagccattt 3720 
aagtgttcag cactgtgtac gatacataat atttggtagc ttgtaaatga aataaagaat 3780 
aaagttttat ttatggctac eta 

3803 



<210> 7 

<211> 3169 

<212> DMA 

<213> Homo sapiens 

<400> 7 

catgacegtg gagcagaacg tgetgeagea gagegeggeg cagaagcacc ageagaegtt 60 
tttgaatcaa ctgagagaaa ttaeggggat taatgacacc cagatactac ageaagcett 120 
gaaggatagt aatggaaact tggaattagc agtggctttc ettactgega agaatgetaa 180 
gacccctcag caggaggaga caacttacta ccaaacagca cttcctggca atgatagata 240 
catcagtgtg ggaagccaag cagatacaaa tgtgattgat ctcactggag atgataaaga 300 
tgatcttcag agagcaattg ccttgagttt ggecgaatea aacagggcat tcagggagac 360 
tggaataact gatgaggaac aagecattag cagagttctt gaagecagea tagcagagaa 420 
taaagcatgt ttgaagagga cacctacaga agtttggagg gattctcgaa acccttatga 480 
tagaaaaaga caggacaaag ctcccgttgg gctaaagaat gttggcaata cttgttggtt 540 
tagtgctgtt attcagtcat tatttaatct tttggaattt agaagattag ttctgaatta 600 
caagcctcca teaaatgetc aagatttacc ccgaaaccaa aaggaacatc ggaatttgee 660 
ttttatgcgt gagctgaggt atetatttge acttcttgtt ggtaccaaaa ggaagtatgt 720 
tgatccatca agagcagttg aaattcttaa ggatgettte aaatcaaatg actcacagca 780 
gcaagatgtg agtgagttta cacacaaatt attagattgg ttagaagatg ccttccaaat 840 
gaaagctgaa gaggagaegg atgaagagaa gecaaagaac cccatggtag agttgttcta 900 
tggcagattc ctggctgtgg gagtacttga aggtaaaaaa tttgaaaaca ctgaaatgtt 960 
tggtcagtac ccacttcagg tcaatgggtt caaagatctg catgagtgee tagaagctgc 1020 
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aatgattgaa ggagaaattg agtctttaca ttcagagaat tcaggaaaat caggccaaga 1080 

gcattggttt actgaattac cacctgtgtt aacatttgaa ttgtcaagat ttgaatttaa 1140 

tcaggcattg ggaagaccag aaaaaattca caacaaatta gaatttcccc aagttttata 1200 

tttggacaga tacatgcaca gaaacagaga aataacaaga attaagaggg aagagatcaa 1260 

gagactgaaa gattacctca cggtattaca acaaaggcta gaaagatatt taagctatgg 1320 

ttccggtccc aaacgattcc ccttggtaga tgttcttcag tatgcattgg aatttgcctc 1380 

aagtaaacct gtttgcactt ctcctgttga cgatattgac gctagttccc cacctagtgg 1440 

ttccatacca tcacagacat taccaagcac aacagaacaa cagggagccc tatcttcaga 1500 

actgccaagc acatcacctt catcagttgc tgccatttca tcgagatcag taatacacaa 1560 

accatttact cagtcccgga tacctccaga tttgcccatg catccggcac caaggcacat 1620 

aacggaggaa gaactttctg tgctggaaag ttgtttacat cgctggagga cagaaataga 1680 

aaatgacacc agagatttgc aggaaagcat atccagaatc catcgaacaa ttgaattaat 1740 

gtactctgac aaatctatga tacaagttcc ttatcgatta catgccgttt tagttcacga 1800 

aggccaagct aatgctgggc actactgggc atatattttt gatcatcgtg aaagcagatg 1860 

gatgaagtac aatgatattg ctgtgacaaa atcatcatgg gaagagctag tgagggactc 1920 

ttttggtggt tatagaaatg ccagtgcata ctgtttaatg tacataaatg ataaggcaca 1980 

gttcctaata caagaggagt ttaataaaga aactgggcag ccccttgttg gtatagaaac 2040 

attaccaccg gatttgagag attttgttga ggaagacaac caacgatttg aaaaagaact 2100 

agaagaatgg gatgcacaac ttgcccagaa agctttgcag gaaaagcttt tagcgtctca 2160 

gaaattgaga gagtcagaga cttctgtgac aacagcacaa gcagcaggag acccagaata 2220 

tctagagcag ccatcaagaa gtgatttctc aaagcacttg aaagaagaaa ctattcaaat 2280 

aattaccaag gcatcacatg agcatgaaga taaaagtcct gaaacagttt tgcagtcggc 2340 

aattaagttg gaatatgcaa ggttggttaa gttggcccaa gaagacaccc caccagaaac 2400 

cgattatcgt ttacatcatg tagtggtcta ctttatccag aaccaggcac caaagaaaat 2460 

tattgagaaa acattactag aacaatttgg agatagaaat ttgagttttg atgaaaggtg 2520 

tcacaacata atgaaagttg ctcaagccaa actggaaatg ataaaacctg aagaagtaaa 2580 

cttggaggaa tatgaggagt ggcatcagga ttataggaaa ttcagggaaa caactatgta 2640 

tctcataatt gggctagaaa attttcaaag agaaagttat atagattcct tgctgttcct 2700 

catctgtgct tatcagaata acaaagaact cttgtctaaa ggcttataca gaggacatga 2760 

tgaagaattg atatcacatt atagaagaga atgtttgcta aaattaaatg agcaagccgc 2820 

agaactcttc gaatctggag aggatcgaga agtaaacaat ggtttgatta tcatgaatga 2880 

gtttattgtc ccatttttgc cattattact ggtggatgaa atggaagaaa aggatatact 2940 

agctgtagaa gatatgagaa atcgatggtg ttcctacctt ggtcaagaaa tggaaccaca 3000 

cctccaagaa aagctgacag attttttgcc aaaactgctt gattgttcta tggagattaa 3060 

aagtttccat gagccaccga agttaccttc atattccacg catgaactct gtgagcgatt 3120 

tgcccgaatc atgttgtccc tcagtcgaac tcctgctgat ggaagataa 3169 
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